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@ Quasi-fair arbitration scheme with default owner speedup. 



@ A decentralized, pipelined, synchronous bus ar- 
bitration scfieme which allows almost completely fair 
arbitration between multiple devices competing for 
the use of a communication bus while allowing the 
device that last used the bus faster access to the 
bus if no other device is competing for its use. The 
arbitration method and apparatus according to the 
present invention allows all devices that participate 
in arbitration equal access to the bus with the excep- 
tion that when bus requests are posted simuita- 
neously the device with the higher priority will al- 
ways be granted use of the bus first. 
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FIELD OF THE INVENTION 



The present invention relates to computer sys- 
tem arbitration elements, in particular, arbitration 
elements providing a fair allocation of system re- 
sources. 



BACKGROUND OF THE INVENTION 



Computer systems include a variety of pro- 
cessing units seelcing access and control of system 
resources. Strict priority allocations arbitration ele- 
ments may preclude lower priority elements of 
necessary access. Thus there remains a need for 
priority allocations while still avoiding a complete 
lock-out of other, lower priority elements. 



BRIEF DESCRIPTION OF THE INVENTION 



According to the present invention, when it 
needs to use the bus a device, A, it asserts its 
request signal and, during the same cycle, ob- 
serves all other devices' request signals. If no other 
device is requesting during the cycle then device A 
becomes the bus owner the following cycle. If 
another device, B. asserts its request signal during 
the same first cycte then the requesting device with 
the highest priority is bus owner the following cy- 
cle. The lower priority device will become bus 
owner immediately following the higher priority de- 
vices' last cycle as bus owner. The device with the 
highest priority with a bus request asserted always 
wins bus ownership. 

During the final cycle of bus ownership the bus 
owner "snapshots" the state of all the request 
signals belonging to lower priority devices and will 
not reassert its request signal until all of the re- 
quests that were captured during the snapshot are 
or will be satisfied. 

If during a devices' last cycle as owner or 
during subsequent cycles device A requires an- 
other bus transfer and no other device has re- 
quested the bus then the owner becomes the de- 
fault owner and need not assert its request sign^ 
thus allowing it access to the bus a cycle sooner 
than if it was required to assert its request signal. 

The arbitration apparatus is redundantly distrib- 
uted to provide every device with a complete ar- 
bitration mechanism which is identical among all 
the devices participating. 

In summary this technique allows multiple de- 



vices equitable access to a bus using a minimum 
of control signals while minimizing the cycles used 
for arbitration. 

5 

BRIEF DESCRIPTION OF THE DRAWING 



These and other features of the present inven- 
10 tion will be better understood by reading the follow- 
ing detailed description of the Invention, taken to- 
gether with the drawing wherein 

Fig. 1 is a block diagram of a computer 
system embodiment of the present Invention; 
T5 Fig. 2 is a block diagram of one embodiment 

of a bug intertace unit; 

Fig. 3 is a block diagram of the interconnec- 
tion of one emobidment of the lock acquisition and 
bus arbitration blocks of the bus intertace; 
20 Rg. 4 is a schematic diagram having further 

detail of the lock acquisition and requesting blocks 
74 and 72 of the embodiment of Figs, 1 and 2; and' 
Fig. 5 Is further detail of the bus arbitration 
block 75 of the bus interface of the embodiment of 
25 Figs. 1 and 2. 

Further details of one embodiment of the 
present invention are provided in the appendix, 
wherein: 

Appendix 1 provides a processor bus Inter- 
30 face specification; 

Appendix 11 provides a bus signal specifica- 
tion; and 

Appendix 111 provides further structural de- 
scription of the processor-bus intertace. 

35 

DETAILED DESCRIPTION OF THE INVENTION 



40 As shown in Fig. 1 , the processors 52. 54, and 
58 and memory units 66 and 68 are devices con- 
nected to a bus 58 via interface elements 70, 72, 
74, 75, 76 and 78 described in more detail with 
regard to bus signalling in AP0LL-111XX, entitled 

45 MULTIPROCESSOR INTERLOCK, filed concun-ent- 
ly herewith and incorporated by reference. Initially 
assume unit 68 is a default bus owner. 

All bus 58 interfaces (BIF) except the default 
owner must request the bus prior to use. There is 

so one bus request level on the backplane per bus 
device. Devices are grouped into two classes. 
Class A devices are awarded the bus In strict 
priority order. Class 8 devices participate in fair 
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arbitration and may also be default bus owners. 
Processors 52. 54 and 56 are class B devices. 

Bus arbitration is redundant and decentralized. 
Every bus interface decides for itself whether it has 
won access to the buss 58. Bus arbitration can be 
inhibited by the assertion of the arb inhibit signal 
on leads 63. Only the current owner of the bus 
may assert arb inhibit. The current owner will do so 
if the intended bus transfer requires multiple cy- 
cles. . , .,, 
If a class A device 68 requests the bus. it will 
assert both its assigned request level 61 and the 
ARB INHIBIT B line on the bus. \/Vhen the BIF 
detects the ^sertion of ARB_INHIBIT_B in an 
active bus arbitration cycle, the BIF will defer to the 
class A device(s). 

Th© class B devices, the processors 52, 54. 
and 58 also have a fixed priority assignment. Po- 
tential assignments are 0 through 3. with 3 being 
the highest priority. The assignment is scanned 
into the BIF and is used to determine which of the 
four class B request parallel backplane signals this 
particular processor is to use. The processor(s) will 
drive its assigned level, and defer to requestors at 
higher levels. 

Fair arbitration is approximated by class B de- 
vices agreeing not to reassert their request lines on 
demand. Rather, a class B device will "snapshot 
or store all other lower priority class B request lines 
In the final cycle of a bus ownership. The class B 
device will then relinquish the bus and not reassert 
a request line until all the snapshotted requests 
are or are about to be. satisfied. The class B 
deJice determines the other requestors have been 
sen/iced by observing the cunrent state of the oth- 
er, request lines. If a request line is deasserted. 
service is undenway or completed. If a request line 
is still asserted, but arbitration is enabled and that 
requestor will win, service Is presumed. 

When the bus 58 is otherwise idle, the last 
successful bidder among the class B requestors is 
also established as the default bus owner. The 
default bus owner may use the bus at the end of 
any cycle in which no other request line was as- 
serted. The default bus owner does not have to 
assert its assigned request line. The default re- 
mains in effect until another class B device wins 
the bus. 

A class B device's bus ownership may be 
suspended by a class A device. If a class A device 
assumes control of the bus. the class B device that 
was the former owner waits for the bus to again 
become idle. The class B device then recfaims bus 
ownership; l.e.. the class B device reassumes the 
ownership in the cycle following one in which ar- 
bitration was permitted, but no request line was 
asserted. If another class B device wins the bus 
before the bus becomes Idle, default bus owner- 



ship is transferred. 

When a BIF first asserts a bus request line, it 
will start a timer 70. If the timer elapses before the 
bus is acquired, a bus acquisition timeout occurs. 
5 The bus timeout duration is approximately 3.2 mil- 
liseconds. If a timeout occurs, the system is as- 
sumed broken and a clock (not shown) freeze 
request is made. 

The timer 70 Is not stopped until a request is 
to confirmed to complete or fail, the timer will there- 
fore expire if a device is continually busy. Broad- 
cast transfers, such as TB invalidates will stop tfje 
timer regardless of the acknowledge line state. The 
same timer 70 is reused for read data return moni- 

*°""shown in Fig. 2 intemal to the BIF 80 are 
multiple competing local requestors: data cache 82 
read, data cache write and Instruction cache 84 
read! Any number of data cache writes, up to the 
20 limit of the write queue size, may be posted and 
awaiting transfer on the bus. Only a single read 
may be posted from each of the read request 
sources: the data cache read and the instructon 
cache read. In general, data cache read will be 
25 prioritized over instruction cache read. In turn, in- 
struction cache read will be prioritized over data 
cache write. However, the following exceptions ex- 
ist: ^ 
. if the write data queue is full, data cache write is 
30 prioritized over instruction cache miss; 

- if a data cache miss collides in address with a 
previously queued write, data cache write is given 
priority over both data and instruction cache miss; 

- if a write and unlock is queued, data cache write 
35 is given priority over both data and instruction 

cache miss; 

- If a data cache miss from an unencacheable 
memory location is posted, data cache write is 
given priority over both data and instruction cache 

40 miss; . ^ ^ * 

- if a data cache miss and lock Is posted, data 
cache write is given priority over both data and 
instruction cache read; 

. if a data cache miss and unlock Is posted, data 
45 cache write is given priority over both data and 
instaiction cache read: 

- if a tb invalidate is queued in the wnte buffer, 
data cache write is given priority over both instruc- 
tion and data cache miss. 

50 A fourth source of request for the return of read 
data, to hself. is given precedence over all other 
transmitters. 

The BIF will Issue on the bus. subsequent 
requests from the data cache no more often than 
55 every other bus cycle. This is required to assure 
write order between processors, and read-wnte or- 
der within one. Further details of system bus pro- 
tocol relating to the reject signal is provided in 
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AP0LL.113XX. entitled PIPELINE COMPUTER 
SYSTEM HAVING WRITE ORDER PRESERVA- 
TION, filed concurrently herewith and incorporated 
herein by reference. The instruction cache miss 
request is not restricted to every other cycle. In the 
cases of load and lock, load and unlock, and store 
and unlock, subsequent requests are not issued 
until a successful bus acknowledge of the prior 
request is received. 

The bus interfaces (BIF) accepts load lock, 
load unlock and store unlock command from the 
Memory Management Unit 86 (MMU). NA^en load 
lock completes successfully, that processor can be 
assured of holding the bus lock until the processor 
explicitly releases tiie lock or an error arises. Only 
one processor at a time may hold tiie bus lock and 
that, in turn, permits the construction of critical 
code sections in a multiple processor environment. 
Furttier details are provided in AP0LL-111XX, in- 
corporated by reference. 

The BIF will secure tiie bus lock only when a 
load lock data cache miss is successfully iransfer- 
red and acknowledged on the bus. In more detail, 
first the data cache miss which seeks tiie bus lock 
Is posted. This request will push ahead of itself all 
previously queued up writes. When tiie lock re- 
quest is next to be serviced, the current state of 
the external bus lock signal is examined. If lock is 
already asserted by another processor, the arbitra- 
tion is deferred. If the bus lock is available, arbitra- 
tion is attempted. If the bus lock signal is subse- 
quently asserted before the BIF gains access to 
the bus. the BIF will witiidraw from furtiier arbitra- 
tion. When the bus Is finally secured, and the 
ARB_INHIB1T_A and ARB_INHIBIT_B and lock 
signals are simultaneously asserted. 
ARB_INHIBIT_A and ARB_INHIBIT_B remain 
asserted for 3 cycles which is sufficient time for all 
other bus interfaces to see the lock signal asserted 
and to witiidraw from arbitration if they too plan to 
secure the bus lock. At the end of 3 cycles, the 
locking BIF will also examine ttie state of the ac- 
knowledge signals. If other than a successful ac- 
knowledge is detected, the bus lock is immediately 
released. If released, the lock signal is deasserted 
at tiie end of the cycle following the acknowledge. 

The BIF will release the bus lock when a load 
unlock or a store unlock is successfully Issued and 
acknowledged. Alternatively, tiie lock is released 
upon an en-or in the local processor. A local pro- 
cessor error is assumed to result In a processor 
trap, and the signal trap dispatch, which so in- 
dicates, is therefore used to unconditionally release 
the bus lock. In more detail, first tiie data cache 
read or write which seeks to release ttie bus lock is 
posted. This request will push ahead of Itself all 
previously queued up writes. At the end of 3 cy- 
cles, the locking BIF will also examine the state of 



the acknowledge signals. If other than a succ ssful 
acknowledge is d tected. the bus lock is retained. 
Otiienwise. the lock signal is deasserted at the nd 
of ttie cycle following the acknowledge. 
5 If a lock request is REJECTd by the BIF 

discussed below, the lock signal 72 and 
ARB_1NH1BIT_A and ARB_INHIBIT_B 63 are 
immediately released. Similarly, if an unlock re- 
quest is REJECrd by the BIF, the lock Is retained 

10 if held. 

Two successive bus address transfers may be 
issued by tiie BIF in bus cycles spaced apart by 
only one NOP or foreign cycle. If tiie first request 
receives a busy acknowledge, the acknowledge is 
75 received only after the second request has been 
sent. In this case, ttie bus REJECT signal on lead 
65 is immediately asserted. The REJECT signal is 
interpreted by the slave as nullifying the already 
accepted request. This use of REJECT assures 
20 that the order of transfers on the bus is retained. 
This is particularly important when the second re- 
quest is a read for the same data that is being 
written by tiie first request. When REJECT is as- 
serted, the acknowledge for the second request is 
25 ignored. When REJECT is asserted, all transaction 
side effects such as bus locking, do not take place. 

It's possible for the MMU to request tiie bus 
lock for PMAPE update while the BIF is already in 
possession of the bus lock. For this reason, a 
30 second load lock request will be accepted. If two 
bus lock requests have been accepted, two bus 
unlock requests will need to follow before the lock 
will really be released. Thus, according to one 
embodiment of the present invention, ttie BIF nests 
35 bus lock requests two levels. 

The BIF starts a timer when the bus lock is first 
acquired. The timer remains mnning so long as tiie 
BIF holds the bus lock. If the timer expires before 
the lock is released, a lock timeout trap is posted. 
40 The timer duration is approximately 200 micro- 
seconds. If a timeout trap occurs, a corresponding 
register (not shown) indicates so. If a second lock 
setting request is processed before a held lock is 
released, the time is not reset. This results in a 
45 somewhat shorter timeout for the second request. If 
an unlock request is being ti'ansfenred upon tiie 
bus. the BIF refrains from arbitration for a new lock 
request for at least five cycles including ttie ti-ans- 
femng one. This delay assures tiiat tiiere will al- 
so ways be two cycles of delay between tiie release 
of a lock and Its reacquisition by tiie same BlF. 

The BIF will retry any request tiiat receives a 
BUSY acknowledge. The retry will continue until 
the bus timeout expires. If an address transfer 
55 receives a BUSY acknowledge, the request Is 
marked as in retry. Ther can be as many as three 
requests In retry at any one time. 

The use of REJECT in cooperation witii tiie 
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write order assurance of the write queu , guar- 
antees that the write order of one processor is 
always presen/ed as seen by a second processor. 
This can permit alternate multiprocessor synchro- 
nization without the need for bus locking. 

As shown in Fig. 3, the arbitration and focl< 
control blocks of the bus interface attach to both 
the system's bus 58. and the processor's locai 
request generation logic 73. A brief glossary of the 
signals generated or received by the local request 
generation logic follows; 

NEED^LOCK is asserted to identify that the 
next processor read to be serviced requires the 
acquisition of the bus lock. 

CONFIRIVI_LOCK_HELD is asserted to iden- 
tify that the processor "read and lock" which just 
took place has been properly acknowledged on the 
bus. This signal handles the situation that a bus 
operation may fail to complete successfully even 
though arbitration succeeds. 

RELEASE_.LOCK is asserted when the pro- 
cessor wishes to abandon the bus lock. The pro- 
cessor chooses to do so when a "read and unlock" 
or "write and unlock" operation has been properly 
acknowledged on the bus. The processor may also 
choose to do so if there has been a local error 
such as lock holding duration timeout 

ARB^WIN is asserted by the bus arbitration 
logic 75 when the processor has been awarded the 
right to transfer on the bus 58. 

MYXFER is asserted by the address/data trans- 
fer logic 78 of the bus Interface when an address 
or dta transfer is underway. 

NEED^BUS is asserted by the processor 
when there is a pending and unservlced processor 
read or write. 

WILL__NEED_BUS is asserted by the proces- 
sor when there "will be" a pending read and unser- 
viced read or write in the next cycle. The advance 
warning of the need for service penmits the early 
assertion of a buss request signal. 

MULTICYC__INHIBIT Is asserted by the 
address/data transfer logic when a request is un- 
derway that requires the sustained and uninter- 
rupted use of the bus. 

Also as shown in Fig. 3. there are a number of 
bus control signals involved in the locking and 
arbitration proposal. A glossary follows: 

LOCK_REQUEST- (62) is asserted by proces- 
sor when it wishes access to the bus lock and is 
not blocked from acquiring the lock for fairness 
reasons. 

LOCK_HELD- (64) is asserted by processor 
when it holds the bus lock. 

BR3-, BR2-, BRI- and BRO- (61) are the four 
bus request lines associated with the four proces- 
sors. 

ARB_INHIBIT_B- (63B) is assert d when the 



"D" level bus requestors are to be inhibited from 
arbitrating for the bus. 

ARB_INHIBIT_A- (63A) is asserted when the 
"A" level bus requestors are to be inhibited from 

5 arbitrating for the bus. 

The signal LOCK_ARB_ENAB is asserted 
and driven by the lock acquisition and request 
block 202 (72) to the bus arbitration block (75) to 
indicate that a processor (52) request may pro- 

fo ceed. 

The lock arbitration and request block is shown 
in more detail in Fig. 4. There are four state ele- 
ments: 250. 252. 254 and 256, which drive' and 
interpret the bus control signals LOCK_REQUEST- 

15 and LOCK^HELD-. When the processor requires a 
bus lock, it indicates the need by asserting the 
signal NEED-LOCK, NEEO_tOCK will cause the 
state element 250 to be set if not inhibited from 
doing so by state element 252 in gate 258. If 250 

20 is set, gate 260 will drive the open collector signal 
LOCK^REQUEST- on the backplane. 
NEED__LOCK is assumed to be deasserted when 
the processor has been granted access to the bus 
so that the request is withdrawn at the correct time. 

25 State element 252 inhibits the assertion of LOCK- 
REQUEST- if this processor had once held the bus 
lock during the duration of time when 
LOCK_REQUEST- has been uninterruptively as- 
serted, providing the basis for the faimess in the 

30 acquisition of the bus lock. LOCK__DEFER inhibits 
this processor from asserting the 
LOCK^REQUEST- signal, as well as inhibiting this 
processor from acquiring the bus as described in 
the next paragraph. This LOCK^DEFER situation 

35 as recorded In 252 is set when the 
CONFIRM_LOCK_HELD signal is presented to 
gate 262. Gate 262 also sustains the 
LOCK^DEFER situation for the duration of the 
assertion by this processor of LOCK^HELD by 

40 state element 254 and/or for the uninterrupted as- 
sertion of the extemal LOCK_REQUEST- signal. 
The open collector signal LOCK_HELD- is driven 
by gate 264 whenever state element 254 is set. 
The state element 254 is set when the processor is 

45 awarded the bus, i.e.. ARB^^WIN is asserted, and 
the processor needs the bus lock. i.e.. 
NEED_LOCK is asserted. Gate 266 determines 
this. Gate 266 also sustains the lock holding until 
the RELEASE_LOCK signal is presented by the 

50 processor. State element 256 is set whenever the 
bus is locked for access by another processor. 
Gate 268 determines this situation by noting that 
the LOCK^HOLD- signal is asserted, but the local 
lock holding state element 254 is not set. \/Vhen 

55 256 is set, a lock requiring processor read cannot 
be allowed to proceed. This determination is made 
by the combination of the gates 270 and 272 and 
presented to the bus art)ltration logic In the signal 
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LOCK_ARB_ENAB. LOCK_ARB_ENAB is al- 
ways set when the processor does not need the 
bus lock. i.e.. NEED_LOCK is deasserted. Alter- 
natively. LOCK_ARB_ENAB is set when the bus 
is not locked, i.e.. state element 256 is not set and 
either of two conditions prevail according to gate 
272- The first condition is simply that this proces- 
sor already holds the bus lock, i.e., state element 
254 is set. The second condition is that there is no 
lock acquisition fairness deference in effect, i.e.. 
LOCK_DEFER driven by state element 252 is not 
asserted. 

The bus arbitration and request block is shown 
in more detail in Fig, 5. For purposes of simplicity, 
this block is drawn as if the processor was perma- 
nently affixed to bus request level 3. In the actual 
Implementation, additional logic is present to per- 
mit the processor to request at any level and can 
be provided according to the detail of Rg. 5. Also, 
the current implementation supports only 4 reques- 
tors, but there is no fundamental restriction in this 
number and a greater or lesser number may be 
accommodated. In the discussion to follow, *'B lev- 
el requestors" and "processors (52. 54, 56)" are to 
be considered synonymous. However, in other im- 
plementation that need not also be so. 

(n Rg. 5. there are five state elements: 300, 
302. 304. 406 and 308. which drive and interpret 
the five bus control signals 61 BR3-. BR2-, BR1-, 
BRO and ARB_INHIBIT_B-. State element 300 Is 
the bus request flipflop. State elements 302, 304 
and 306 snapshot the state of the other processor 
bus request signals to be used in the fairness 
deference algorithm of this processor. State ele- 
ment 308 is the record of whether this processor is 
the default owner of the bus. 

Gates 320. 322, 324 and 326 determine if one 
of the four processors may secure the bus In the 
next cycle. BRO_WIN is asserted by gate 326 if 
all higher priority requests (BR3. BR2 and 8R1) are 
not asserted, and B level request arbitration is not 
inhibited, i.e., ARB_INHIBIT_B is not asserted. 
Similariy. BR1_WIN is asserted by gate 324. 
BR2_WIN by gate 322, and BR3_WIN by gate 
320. The processor associated with request level 
three can only fail to win the bus if 
ARB_INHIBIT_B is asserted. ARB_INHIBIT_8- 
is asserted on the bus, by this processor or others, 
for one of two reasons. The first reason is that the 
current transfer requires multiple uninterrupted bus 
cycles. In that case, both ARB_INHIBIT_B- and 
ARB_lNHrBIT_A- are driven by the address/dala 
transfer block 78 to suspend all new arbitration for 
tile bus. The second reason is that an "A" level 
requester wishes access to that bus. If any "A: 
level device requests the bus, that bus interface 
must also drive the signal ARB_INHIBIT_B-to 
suspend all "B" level d vice arbitration. In this 



manner, "A" level devices are assured total priority 
over "B" level ones. 

The bus request flipflop 300 is set when the 
processor wishes to use the bus, i.e.. 
5 WILL_NEED_BUS is asserted, and the processor 
has not just secured the use of the bus, i.e.. 
ARB_WIN is not asserted, and tiie processor is 
not deferring to any of the other three processors. 
This combination of events is determined by gate 
to . 310. Once flipflop 300 is set. gate 312 uncondition- 
ally drives the bus signal BR3- so that other pro- 
cessors may decide arbitration as well. Bus request 
deference is in effect if any of the three signals 
driven by gates 314. 316 or 318 are asserted, 
rs Conceptually, these gates are asserted if the asso- 
ciated bus request signal is currentiy asserted and 
the requestor will not be serviced next, or if tine 
associated bus request signal had been asserted 
when this processor had last transferred on the bus 
20 and there has been no service granted since that 
time. Specifically, gate 314, for example, will be 
asserted If BR2 is asserted and BR2 will not be 
granted ttie bus in the next cycle, i.e.. BR2_WIN 
Is not asserted, and one of two conditions prevail. 
25 The first condition is that the current bus cycle is 
owned by this processor, i.e.. MYXFER is asserted. 
The second is that the state element 302 is set. 
The state element 302 is set if the condition of BR2 
asserted and BR2_WIN not asserted was true at 
30 the time of the last bus operation by this processor. 
This combination of conditions assures that a pro- 
cessor will not reacquire the bus twice in succes- 
sion without all other processor bus requestors 
having an opportunity to do so as well. 
35 State element 308. CURRENT^OWNER, is set 
when this processor is the last one to transfer on 
the bus and the element remains set until an- 
otiier"B" level requestor acquires the bus. Specifi- 
caily. gate 328 will allow the element to be set if it 
40 Is already set or tiie current transfer belongs to this 
processor( as decide by gate 333) and no other 
processor will acquire the bus in the next cycle. 
Otiier processors may not acquire the bus eitiier 
because ARB_INHIBIT_B is asserted or because 
45 no other processor is requesting the bus. These 
events are combined in gate 330. with gate 332 
detecting the absence of other "B" level requests. 

Finally, ARB_WIN is asserted if this processor 
is granted access to the bus in the next cycle. Gate 
so 336 drives tiie signal if there Is tine lock acquisition 
and request blocks 74 and 72 drive the 
LOCK^ARB^ENAB signal and tiie processor oth- 
enwise is awarded the bus. This qualification as- 
sures that a processor will not get access to a 
56 locked bus If the processor also requires lock ac- 
quisition. Gate 334 decides whether the processor 
Is othenvlse awarded tiie bus. The processor may 
be so awarded for two reasons. In the first case, It 
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is awarded the bus If the bus is needed 
(NEED__BUS), the associated bus request line is 
asserted (BR3). and the bus prioritization logic says 
there is no higher priority requester (BR3_WIN). 
The second situation is the one of default owner- 
ship. Again, the bus must be needed 
(NEED_BUS), there must be no 
ARB_INH1B1T_B. In effect, and this processor is 
the default owner as already decided by gate 328. 
Gate 334 combines all of these events. 

Modifications and substitutions of the present 
invention by one of ordinary skill in the art are 
considered to be within the scope of the present 
Invention, which not to be limited except by the 
claims which follow. 



Claims 

1 . A method of bus arbitration for providing bus 
ownership by a selected one of a plurality of con- 
nected devices having differing levels of bus own- 
ership priority* comprising the steps of: 
generating a bus request signal by a first device at 
a generating a bus request signal by a first device 
at a specified cycle; 

generating a bus request signal by a second de- 
vice at said specified cycle: 
granting ownership of said bus to the one of said 
first and second devices having a higher priority: 
storing a signal representative of the bus request 
signals belonging to lower priority devices at the 
last cycle of ownership of the higher priority device 
bus ownership; 

granting ownership of said bus to the other of sard 
first and second devices at the conclusion of the 
last cycle of said higher priority device; and 
withholding generations of a (subsequent) bus re- 
quest signal by the one of said first and second 
devices having a higher priority until the bus re- 
quests corresponding to the stored representative 
signal have received and completed ownership of 
^.said bus. 

2. The method of claim 1 further including the 
step of generating an arbitration Inhibit signal by 
the current owner for bus ownership requiring mul- 
tiple cycles, wherein generations of bus request 
signals by otiier devices is inhibited. 

3. The method of claim 1 further including the 
step of granting default ownership to the last owner 
of said bus. 

4. Apparatus for arbitrating bus ownership by 
one of a plurality of access requesting device hav- 
ing differing levels of bus ownership priority, com- 
prising: 

means for providing a first device bus request 
signal; 

means for providing a second device bus request 



signal; 

means for granting ownership of said bus to the 
one of said first and second devices having a 
high r priority; 

5 means for storing the bus request signals of lower 
priority devices at the last cycle of ownership of 
the higher priority device bus ownership: 
means for granting ownership of said bus to the 
other of said first and second devices at the con- 

w elusion of the last cycle of said higher priority 
device bus ownership; and 
means for withholding generation of a bus request 
signal by the one of said first and second devices 
having a higher priority until the bus requests cor- 

15 responding to the stored bus request signals of 
lower priority devices having received and com- 
pleted ownership of said bus- 

5. The apparatus of claim 4 wherein said high- 
er priority device includes means for generating an 

20 arbitration inhibit signal for inhibiting the generation 
of bus request signals by lower priority devices 
while said higher priority device has bus owner- 
ship. 

6. The apparatus of claim 5 wherein each of 
25 said plurality devices includes means for granting 

ownership of said bus, means for storing and 
means for generating an arbitration inhibit signal. 

7. The apparatus of claim 6 wherein said 
means for granting ownership establishes the last 

30 requesting device to which ownership was granted 
as the default owner of tine bus in succeeding 
cycles. 
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CHAPTER 1 



OVERVIEW 



1.1 



Major Responsibilities 



The AT CPU X-8us interface. BiF, attaches the processor's instruction and data caches to the system 
backplane bus. The principal functions of the BIF unit are: 

o to support the X-Bus reads necessary to fill the instruction and data caches. 

o to queue and deliver processor stores to the X*BUS, isolating the CPU from X-BUS write 



o to act as a bus watcher and ensure cache coherency in the face of external stores. 

o to act as a clearing house for system communications to and from the CPU such as 
interrupts. 

o to maintain and checic CPU cache data parity. 
In addition, the BIF provides much of the support logic for the self test of the CPU cache RAM's. 



The CPU's bus interface is composed principally of 3 gate arrays. The bus interface logic also In* 
eludes the instruction and data cache dupticate tag stores, the X-BUS interface transceivers, and 
some supporting tristate drivers. 

The address gate array, CBA, handles outgoing and inbound address transfers. Outgoing address 
transfers occur for instruction and data cache read issue, and for data cache write issue^ Inbound 
address transfers are required for cache entry invalidation caused by external writes, and for cache 
miss fiiGng. The .CBA patci. array. ^iso^iiaint^^^^ duplteat^ tag stores and handles all bus watching. 
Rnally, the C8A gate array accepts and forwards Interrupt requests to the processor. 

The data gate arrays. CdO's. are identical. One is assigned responsibility for the transfer of even 
bytes, and the second is assigned the transfer of odd data bytes. The CBD gate arrays queue and 
forward write data, and return read data.- The C80 gate arrays checic and maintain the cache parity. 

The following processor block diagram roughly illustrates this partition. A comprehensive blocK dla-* 
gram of the gate array logic alone can bq found in Appendix C. 



latencies. 



1.2 



BIF Overall Block Diagram 
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1.3 Bus Interconnect 

The CPU's bus interface accepts and returns processor addresses from the PA, SASRC, PCSBC, and 
VPN bus's. The B(F also accepts and returns data from the processor iNST and DATA bus's. The 
X-Bus is the path to main memory used by the BIF. 

For a data cacha read miss, the physical address is provided to the BIF by the MMU ov r the PA bus. 
The accompanying VPN is captured by the BIF directly from the EAVPN bus. When th cache fill 
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Overvi w 1-3 



bsQins. the cache index is supplied by the 6IF to the EASBC bus over the PA bus. The memory data is 
supplied directly to the cache DATA bus. 

For an instruction cac/re read m/ss. the physical address is provided to the BiF by the MMU over the PA 
bus. The accompanying VPN Is captured by the BIF directly from the PCVPN bus. When the cache fill 
begins, the cache Index is supplied by the BIF to the PCSBC bus over the PA bus. The memory data Is 
supplied directly to the cache INST bus. 

For a data cache wnte, the physical address is provided to the BIF by the MMU over the PA bus. The 
accompanying VPN is captured by the BIF directly from the EAVPN bus. The store data has previously 
been captured by the BIF directly from the DATA bus. 

When an external write requires the purging of a local cache entry* the invaiidate address is supplied 
by the BIF to the MMU over the PA bus. 
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CHAPTER 2 



XBUS INTERFACE 



2.1 



XBUS Arbitration 



All X-Bus interfaces except the default owner must request tHe bus prior to use. There is one bus 
request level on the backplane per X-Bus device. Devices are grouped Into two classes. Class A 
devices are awarded the bus in strict priority order. Class B devices participate in fair arbitration and 
nnay also be default bus owners. CPU's are class 8 devices. 

Bus arbitration is decentralized. Every bus interface decides for itself whether it has won access to the 
X-Bus. 

Bus arbitration can be inhibited by the asseaion of the aro inhibit backplane signal. Only the current 
owner of the bus may assert arts Inhibit, The current owner will do so if the intended bus transfer 
requires muttipis cycles. 



2.1.1 Class A Request Override 

}f a class A device requests the bus. it will assea both its assigned request level and the bus f^quest 
sum tine on the bus. When the BIP detects the assertion of bus request sum in an activa bus arbitration 
cycle, the BIF will defer to the class A device(s). 

2.1.2 Class B/CPU Requesting 

The class 6 devices, the four CPU's, also have a fixed priority assignment. Potential assignments are 
0 through 3, with 3 being the highest priority. The assignment is scanned into the BIF and is used to 
determine which of the four ciass B request parallel backplane signals this pantcular CPU Is to use^ 
The CPU wUl drive its assigned levol, and defer to requestors at higher levels. 

Fair arbitration ts approximated by class B devices agreeing not to reassert their request fines on 
demand. Rather, a class B device will snapshot ail other tower priority class B request lines in the final 
cycle of a bus ownership. The class B device will then relinquish the t>us and not reassert a rBquesr 
line unttt aU the snapshotted requests are satisfied. The class B device determines the other reques- 
tors have bean serviced by observing the current state of the other request lines. If a request line is 
deasserted, service is underway or completed. If a request line is still asserted, but arbitratioii Is 
enabled and that requestor will win, service is presumed. 

2.1.3 Default Ownership 

When the bus is otherwise idle, the last successful bidder among the class B requestors is also estab- 
lished as the default bus owner. The default bus owner may use the bus at the end of any cyole in 
which no other request line was assened. The default bus owner does not have to assert its assigned 
request line. The default remato In effect until another class B device wins the bus. 

A class B device's bus ownership-may be "suspended** by a class A device. If a class A device 
assumes control of the bus. the class B devk: that was the former owner waits for the bus to again 
become idle. The class 8 devices then reclaims bus ownership; i. e.. th class 6 device reassumea 
the ownership in th cycle following one in which arbitration was permitted, but no request tin was 
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2-2 XBUS Interfac 



assened. if another ctass 8 device wins the bus Defore the bus beconnes idle, default bus ownership 
is transferred. 

2.1.4 Acquisition Timeout 

When a BIF first asserts a bus request line, it wilt start a timer. If the timer elapses before the bus is 
acquired, a bus acquisition timeout occurs. The bus timeout duration is approximately 3.2 miilisec- 
onds (16 bit counter). If a timeout occurs, the system is assumed brolcen and a clock freeze request 
is made of the SCR. The internal 6IF state is preserved insofar as possible. 

The timer is not stopped until either a NOACK or ACK acknowledge is received for the request address 
transfer. The timer will therefore expire If a device is continually busy. Broadcast transfers, such as . 
TB invalidates will stop the timer regardless of the acknowledge line state. 

The same timer is reused for read data return monitoring. See section 2.2.2. 

2.1. 5 Local Request Prioritization 

Internal to the BIF are competing local requestors: data cache read, data cache write and instruction 
cache read. In general, data cache read will • be prioritized over instruction cache read, in turn, 
instruction cache read will be prioritized over data cache write. There are exceptions, 

• If the write data queue is full, data cache write Is prioritized over instruction cache miss. 

• (fa data cache miss collides in address with a previousty queued write, data cache write 
is given priority over both data and instruction cache miss. 

• if a write to an unencacheatiie memory location is queued, data cache write is given priority 
over both data and instruction cache miss. 

• If a write and untock is queued, data cache write is given priority over both data and 
instruction cache miss. 

• If a data cache miss from an unencacftea6/e memory location is posted, data cache write is 
given priority over both data and instruction cache miss, 

• If a data cache miss and iOGic is posted, data cache write is given priority over both data and 
instruction cache read. 

• If a data cache miss and un/ocir is posted, data cache write is given priority over both data 
and instruction cache read. 

• if a rb invaiidate Is queued in the write buffer, data cache write Is given priority over both 
instruction and data cache miss. 

A locally generated READ RESPONSE required for a BIF CSR read is given precedence over all other 
transmitters. 

2.1.6 Subsequent Request Arbitration Delay 

The BIF wilt issue subsequent requests from the data cache no more often than every other bus cycle. 
This is required to assure write order between processors, and read*write order within one. The 



May 9, 1988 



Apoilo Confidential ** 

- 34- 



EP 0 366 434 A2 



XBUS Interface 2-3 



instruction cache miss request is not restricted to every other cycle, in the cases of /oatf ana fock, 
loaa ana unlock, and store ana uniock, suosequent requests are not issued until a successful bus 
acknowledge of the prior request is received. 

The BIF will issue subsequent requests from a CPU no more often than every other bus cycle. This Is 
required to assure wnte order, it was an implementation convenience to apply it generally, in the 
cases of loaa ana /ocic« /cad and unlock, and store and unlock, subsequent requests are not issued 
until a successful bus acknowledge of the prior request is received. 

2.2 XBUS Reads 

X-Bus reads are split into two parts: address transfer and data return. The BIF arbitrates for an 
address transfer to initiate a data or instruction cache miss. The bus interface then awaits data return. 
The BIF arbitrates for data retum only when responding as a slave to a CSB read. 

2.2.1 Read Initiating 

When the BIF wins the bus* and decides that a read is the highest priority task* it will transfer the read 
address and issue either a R6AO or a flSAO MULTIPLE command. It will issue a READ command if the 
CPU request was less than or equal to 32 bits and was either unencacheable or would change the bus 
lock status. The BIF will Issue a READ MULTIPLE command otherwise. 

If the request was a REAO, the byte mask accompanying the address will decide the exact request 
size. 

If the request was a READ MULTIPLE » additional request information is provided in the address and 
data fields. The information is summarized in the next figure. The WE field will always be 01 . The LL 
field will be 00 for a 64 bit read. 01 for a data cache normal fill. 10 for an instruction cache fill and 11 
for an extended data cache fill. The LONGWORO COUNT field will be unused in processor requests. 



XBUS READ MULTIPLE 
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62 


SI 


34 


33 32 
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PHYSICAL ADDRESS 


W E 


31 






OS 07 


00 




'>^^^Z^^^^^<^'^^0^'0:^0^ UONOWOHO COUNT 



LL WE 

00 TRAIOSFER LENGTH . 2 LONQWOftDS 00 USE LONGWORO COUNT. MODULO WRAP 

01 TRANSFER LENGTH . 4 LONGWORD8 01 LENQTH SPECIFIED BY LL. IWOOULO WRAP 

10 TRANSFER LENQTH s 8 tONGWORDS 10 USE LONGWORO COUNT 

11 TRANSFER LENGTH « 16 LONGWOROS 11 L&4QTH SPECIRED BY LL 



There can be multiple reads outstanding on the X-Bus from a single CPU. In such a case, returning 
read data is distinguished by the subid field. Subid s xO is used for the data cache. Subid s xl Is used 
for the instruction cache. 

The read address is sourced by the C8A gate array, but the virtual page offset within segment, or 
VPN, is provided by th CBO ones. Wh n the read address is transferred* the CBA gate array cap- 
tures the associated VPN for subsequent use during each fill and OTS update. 
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2.2.1.1 Read Initiation Bypass 

When a read MMU command is being decoded by the 8IF and there are no previous internal requests 
pending, the arriving PA will be fonvarded immediately to the X-8us outbound address register, if the 
8IF is the default bus owner, and no external bus requests are pending, and internal request initiation 
is not suspended for any reason, the reao request will be initiated In the following bus cycle. 

2.2.2 Read Data Return 

After the BIF Initiates a bus read, it waits for the return of read data. Several outcomes are possible: 
data returns as expected, data returns but is in error, and data fails to return. 

The expected data return is either one (READ) or more (REAO MULTIPLE) data transfers identified as 
REAO RESPONSE'S. The returning data will appear on the 64 bit bus aligned as if in memory: byte 000. 
if present, in bit positions 63:56 and so on. If multiple REAO RESPONSE cycles are expected, they will 
either be immediately abutting or have intervening NOP*s. If there are intervening NOP*s. there will 
always be at least 2 such NOP's and arp inhiOtt will be asserted by the responder to prevent any 
Intervening unrelated bus operations. 

If bad data is returned, the accompanying command code will be REAO RESPONSE ERROR. This may 
be ^used by the detection of an uncorrectable ECC or parity error. It may also occur because of a 
bus timeout or address error In the responding device. No further data will be returned subsequent to 
a REAO RESPONSE ERROR. A REAO RESPONSE ERROR may occur In any cycle of a mutliple transfer 
read return bus sequence. 

The last possible outcome for a read is for the read data to fail to return. This can only happen In the 
presence of a hardware failure. 

2.2.3 Read Return Timeout 

The failure of read data to return is detected by the expiration of the SIF's bus timer while a read 
request remains outstanding on the bus. This is the same timer used in bus acquisition timeout. As 
mentioned in section 2.1.4. the timer Is started when any request is posted. If art)itration succeeds 
and a write or tb invalidate follows, the timer is stopped after receiving either an ACK or a NOACK 
acknoweldge. If arbitration succeeds and a read Issue follows, the timer is continued, if the tlmea^ 
then expires before the last read data returns, a read return timeout occurs. If a timeout occurs, the 
system is assumed broken and a dock freeze request is made of the SCR. The intemai BIF state is 
preserved insofar as possible. 

If two reads are concurrently outstanding, the timer is restarted when read data return completes for 
each request. This may result In a somewhat longer timeout for the second read request. 

If a second request, whether read, write or tb invalidate Is Issued while a read is outstanding, the timer 
is not stopped. This may result in a somewhat shorter bus acquisition timeout for these sufcisec^uent 
requests that will expire eoinddeniiy with the read data return timeout. 

2.2.4 Read Return Minimum Time 

The READ RESPONSE for a READ or READ MULTIPLE command must be no sooner than the first cycle 
after the acknowledge cycle for the address transfer. This is also the minimum time possible within the 
bus protocol except for default bus owners. 

2.2.5 Read Return Acknowledge 

The BIF will either successfully acknowledge, or error acknowledge, a REAO RESPONSE addressed to 
it. If an error acknowledge is generated, the returning data will be forwarded as If correct t the data 
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or instruction caches. Error status wiH be recorded in the embedded scan state and a clock freeze of 
the SCR will be requested. 

2.3 XBUS Writes 

When the BIF wins the bus. and decides that a write is the highest priority task, it will transfer the write 
address and data. Either a WRITE or a WRITE MULTIPLE command is sent. The BIF will issue a WRITE 
command if the data to transfer is less than or equal to 32 bits. The BIF will issue a WRITE MULTIPLE 
command if the data to transfer is 64 bits or more. 

If the request was a WRITE, the data accompanies the address and the associated byte mask decidea 
the exact request size. 

If the request was a WRITE MULTIPLE, the address and transfer direction are sent In the first cycle. Bit 
32 Is 0 if the address is ascending, and bit 32 Is 1 if the address is descending. The second and 
subsequent cycles transmit 64 bits of data accompanied by a WRITE DATA command. Note that all 
transfers begin and end on quadword boundaries. 

2.3.1 XBUS Write Multiple Umit 

The BIF will continually monitor its intemai write address and data queue to determine if the next write 
data to transfer is an adjacent address quadword. if so. the write multiple will be sustained. To 
prevent excessive bus use by one processor, the BIF will stop a write multiple arbitrarily at every 256 
byte boundary (32 transfers). Write multiple data will always be sent in immediately adjacent btis 
cycles. 

No odd longword start, write multiples will be generated by the BIF. 

2.3.2 XBUS Initlai Write Hold Off 

The BIF will not attempt to transfer write data as soon as the request is posted. Rather, the BIF will 
delay in anticipation that subsequent writes to adjacent addresses are likely. The request is finally 
posted only if one of the following conditions is true. 

• If a second write to any address is queued. 

• If the pending write was not encacheable. 

• If the pending write would unlock the bus. 

• if there is a pending data cache miss, which collides in address with the pending write. 

• If there Is a pending data cache miss that is unencacheabie or would change the bus lock 
status. 

e If the free running BIF counter overruns (safety measure). 

• If the write is really a TB invalidate. 



2.3.3 XBUS Write Monitoring 

All X-Bus writes are monitored even if they are not directed t • or originated by. the local BIF. The BIF 
will determine if a copy of the data at the write addr ss has been locally cached. If so, the BIF will 



May 9, 1988 



EP 0 366 434 A2 



2-6 XBUS Interfac 



scneduie an invalidate of that cache entry. This relies upon the BIF maintaining duplicate tag stores 
and is detailed in chapter 5. 

2.3.4 XBUS Writes To BIF CSR*s 

When the BIF detects a 32 bit write into its own register range, a WRITE MULTIPLE of 2 longwords is 
substituted for a WRITE connmand. 

2.3.5 XBUS Write IVIultipie Acknowledge 

The acKnowledge for the WRITE MULTIPLE command will be OK only when the slave can accept at least 
the first 64 bits of data. 

The acknowledge for the WRITE DATA command associated with a write multiple will be busy if the 
associated 64 bits of data cannot be accepted and must be retransmitted. 

An error or no acknowledge for a WRITE DATA command will be interpreted as a busy acknowledge in 
order to preserve state. It is presumed the acknowledge driver will freeze the clocks. 

2.4 XBUS Slave Response: CSR Access, Interrupt Posting 

The BIF holds 5 operationally available registers: ERRADDR, BCTRL. iCTRL, PTIMER, and ISUM. The 
registers are detailed in chapter 7. Access to these registers is over the X-Bus. In addition, the BIF 
posts interrupts to the local processor in response to bus writes. 

The addresses to which the BIF responds as a slave device follow. 

BIF REGISTER ADDRESSES 

OOpp 0200: Intenrupt Summary Register (ISUM) 

OOpp 020B: Intenrupt Control Register (ICTRL) 

OOpp 0210: Bus Control Register (BCTRL) 

OOpp 021 8: Bus Error Address Register (ERRACpR) 

OOpp 0220i Process Timer (PROC^TIMER) 

OOpp 0100 

- OOpp 01 3C: interrupt Posting Addresses 

PP m PROCESSOR NUMBER 



2.4.1 XBUS Slave Response: CSR Read Return 

The BIF will decode all incoming read requests. If the address matches one alotted to the interface. 32 
bits of read data will be returned. The data will be returned In bit positions 63 through 32. 

The BIF will sometimes delay register read data response so that the read data will be returned no 
sooner than the founh cycle after the one that provided the read address. This Is only necessary 
when the BIF is the default' bus owner. 
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The BiF will give a busy resoonse when a second X^Bus read request arrives for a register which has 
an X-6US read underway. Otherwise, all read requests will be accepted. 

The BIF wtll give a no response when if the read request is for other than 32 bits. 

2.4.2 XBUS Slave Response: CSR Write Accept, Interrupt Posting 

The BiF will decode all Incoming write requests. If the address matches one allotted to the interface, 
the request will be acknowledged. 

If the address is one of the interrupt posting locations, a WRITE command is expected. The data and 
byte mask are not interpreted. 

tf the address is one of the accessible CSR's. a WRITE MULTIPLE command is expected. A request 
length of 1 or 2 tongwords Is. expected with the data provided in bit positions 63 through 32 of the first 
WRITE DATA command. This is necessary because of the positioning of the CSR registers in the CBA 
IC. 

The BlF will give a busy acknowledge when an X-Bus write request of any type arrives for a register 
which has an X-8US read underway. 

The BIF will give an error acknowledge when it detects a parity error in a write data. A WRITE MULTI- 
PLE to an interrupt posting address, or a simple WRITE directed at a CSR wilt also generate an error 
acknowledgement, in either case, embedded state will be set and a clock freeze request to the SCR 
generated. 

2.S XBUS TB Invalidates 

The local processor can issue TB invalidates for broadcast over the X-Bus. The BIF accepts, queues 
and delivers to the X-Bus TB invalidates as if they were writes. 

2.S.1 XBUS TB Invalidate Issuing 

The BIF will transmit TB invalidate requests accompanied by the comands INVAL TB SEL and INVAU- 
OATE TB. If the former command is issued, the address field can be asstjmed to hokl the virtual page 
address of the entry to be invalidated. The vinual page number; address bits 31 through 12. caabe- 
found on the bus In bit pbsitlons 63 through 44. 



No acknowledge is expected or awaited upon the issue of a TB Invalidate command. 



63 




44 


32 


X-BUS 


31 


VIRTUAL PAGE NUMBER 

















The Virtual Page Number Is transferreti on X~BUS tuta 63:44 during INVAL TB SEL end INVAUDATE TB 
commands. 

2.5.2 XBUS TB Invalidate Accepting 

The BIF wiH unconditionaiiy accept all X-BUS TB invalidate requests and forward them to the MMU 
threuoti the invalidate queueing mechanism. Chapter 5 provides additional explanation. 
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2.6 



XBUS Locking 



The B!F accepts /oad took, load unlock and store uniock command from the MMU. When load lock 
completes successfully, that CPU can be assured of holding the bus lock until the CPU explicitly re- 
leases the lock or an error arises. Only one CPU at a time may hold the bus lock and that, in turn, 
permits the construction of critical code sections in a multiple processor environment. 

Because the holding and release of the bus locks spans many bus cycles, a method for assuring 
fairness among cpu's in acquiring the bus lock is also implemented. 

2.6.1 XBUS Lock Acquisition and Release 

The BIF will secure the bus lock only when a /oacf ioek data cache miss is successfuUy issued and 
acknowledged on the X_GUS. In more detail, first the data cache miss which seeks the bus lock is 
posted. This request wiH push ahead of Itself all previously queued up writes. When the lock request 
is next to be serviced, the current states of the external bus iock and lockjequBst signals are exam- 
ined, if lock is already asserted by another CPU. the arbitration is deferred. If arbitration is deferred 
for this reason, the CPU will assert the lockj^quest signal and await the deassertion of lock. The 
arbitration may also be deferred if this is the second acquisition of the bus lock by the same CPU" 
without an intervening deassenton of the lock request signal. This \ock. request defmrral assures fair 
access to the bus lock among all competitors. If the bus lock is available and there Is no need for lock 
request deferrence. arbitration is attempted. If the bus lock signal is subsequently assened before 
the BIF gains access to the X-Bus. the BIF will withdraw from further arbitration and drive the lockj^ 
quest signal. When the bus is finally secured, both the arb inhibit and lock signals are simultaneotkly 
asserted. Arb inhibit remains asserted for 3 cycles which is sufficient time for all other bus interfaces 
to see the iock signal assened and to withdraw from arbitration if they too pian to secure the bus iock* 
At the end of 3 cycles, the locking BIF will also examine the state of the acknowledge signals, if other 
than a successful acknowledge is detected, the bus lock is Immediately released. If released, the iock 
signal is deassened at the end of the cycle following the acknowledge, in all cases, the bus lockj^ 
quest signal is defeated in the first cycle after the lock signal is generated. 

The BIF wiii release the bus lock when a load unlock or a store unlock is successfuUy issued and 
acknowledged. Alternatively, the iock Is released upon an error in the local processor. A local proc- 
essor error is assumed to result in a processor trap, and the signal frap dIspatdJ is therefore used to 
unconditionatly release the bus iock. in more detail, first the data cache read or write which seeks to 
release the bus iock is posted. This request will push ahead of itself ail previously queued up writes. 
At the end of 3 cycles, the locking BIF will also examine the state of the acknowledge signals. If other 
than a successful acknowledge is detected, the bus lock is retained. OthenAnse. the iock sigctal-is 
deasserted at the end of the cyde foltowing^ttte.acknowlQdgei^ . . - r , > - yhu ^ 

if a lock request is REJECT*d by the BIF. the lock signal and arb inhibit are immediately released. 
Similarly, if an unlock request is REJECT*d by the BtF, the iock is retained if held. Section 2.8 de- 
scribes the use of tfte signal REJECT. 

2.6.2 XBUS Lock Nesting 

It's possible for the MMU to request the bus lock for PMAP6 update while the BIF is already in posses* 
sion of the bus took. For this reason, a second load lock request will be accepted, if two bus iock 
requests have been accepted, two bus unlock requests will need to follow before the lock will really be 
released. In effect, the BIF nests bus lock requests two levels. 

2.6.3 XBUS Lock Duration Timeout 

The BIF starts a timer-when-the bus iockis first acquiredr The timer remains running so long as the BIF 
holds the bus lock. If the timer expires befor the iock is rel ased. a lock timeout trap is posted. The 
timer duration is approximateiy 200 mteroseconds (12 bit counter). 
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If a timeout trap occurs, the 6CTRL register indicates so. The 6CTRL register is described in chapter 
7. 

If a second lock setting request is processed before a held tocK is released, the timer is not reset. 
This results in a somewhat shorter timeout for the second request. 

If an unlock request is being transferred upon the X-Bus. the BIF refrains from arbitration for a new 
lock request for at least five cycles including the transferring one. This delay assures that there wUI 
aiways be two cycles of delay between the release of a lock and its reacquisition by the same BIF. 



2.6.4 XBUS Data Consistency Under Lock 

The BIF guarantees that once a lock has been acquired that all writes on the bus that preceded the 
/oacf lock transfer have successfully invalidated the cache. This is a natural outcome of an X<-6u8 
READ command requiring at least 4 cycles before the READ RESPONSE command will be seen. 



The BiF will retry any request that receives a BUSY acknowledge. The retry will continue until the bus 
timeout expires. 

If an address transfer receives a BUSY acknowledge, the request is marked as in retry. There can be 
as many as three requests in retry at any one time. Retry requests receive no different priority treat- 
ment than was outlined in section 2.1.5 other than following retry hoidoff. 



If a request is in retry, it is not immediately posted to the bus. The minimum request spacing for a 
retry is 5 cycles: 3 to make the original transfer and await the acknowldge, 1 to mark the request as in 
retry, and 1 to rearbilrate for the bus. 



2.8 XBUS Reject 

Two successive bus address transfers may bB issued by same the BIF ;n bus cycles spaced apart by 
only one NOP or fofeign uyule.- If thefirsrrequesrrecelves a busy acknowledge, the acknowledge is 
received only after the second request has been sent. In this case, the bus REJECT signal is immedi* 
ately asserted. The REJECT signal is interpreted by the slave as nullifying the already accepted re- 
quest. This use of REJECT assures that the order of transfers on the bus is retained. This is particu- 
lariy important when the second request is a read for the same data that is being written by the first 
request. 

When REJECT Is asserted, the acknowledge for the second request Is ignored. 

When REJECT Is asserted, ail transaction side effects such as bus locking, do not take place. 

2.8.1 XBUS Write Order Assurance 

The use of REJECT in cooperation with the write ord r assurance of the write queue, guarantees that 
the write order of one CPU is always preserved as seen by a second CPU. This can permit some forms 
of multiprocessor synchronization without the need for bus locking. 



2.7 



XBUS Request Retry 



2.7.1 



XBUS Retry Hoidoff 
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DATA CACHE INTERFACE 



3.1 



Data Cache Read Miss 



Processor operand toads are usually satisfied by the data cache. A data cache read miss occurs when 
the data cache does not presently have the requested item. A cache read miss also occurs when the 
read request must be forwarded to the bus regardless of whether cached data is available. Typical of 
this latter situation is a read from an I/O control register. 



Cache miss processing is the joint responsibility of the BiF and the MMU. 
address and informs the MMU as the data RAM's are written. 



The BIF sources the fill 



3.1.1 MMU Request to the BIF 

The read's 30 bit physical address is provided by the MMU on the PA bus. The MMU command 
accompanies the physical address. 

The read's virtual page offset within segment. VPN, bits wiil be presented in advance of the physical 
address and command. Typically, the 7 bits are captured by the BIF from the external £A register 
every cycle. If a read miss occurs, the physical address and command will then arrive in the following 
cycle, if however, the PA bus Is not available In this succeeding cycle, the MMU wiil assen the signal 
MMU_HOLO_OVPN. The BIF wIV hold the captured data cache VPN. MMU_HGLO_OVPN wiil be deas- 
serted in the cycle in which the physical address and command are finally sent to the BF. 

There are quite a few comnnands that apply to data cache miss. They are summarized in the next 
table. 



MEMJ;MD(4:0] 



00000 


NOP 


10000 


00001 
00010 


toad.nolock.cache. 1 6 


10001 
10010 


00011 


ioad.natocfc.cache.64 


10011 


00100 


fbad.noldck.nocache. 1 


10100 


00101 


toad.nolocfc.nocache.2 


10101 


00110 


load.nolock.nocache.4 


10110 


00111 


load.nolock. nocache. 8 


10111 


01000 


lQad.locfc.nocache. i 


11000 


01001 


load.tocfc.nocache.2 


11001 


01010 


load.lock.nocache.4 


11010 


01011 


load.lock.nocaGhe.6 


11011 


01100 


load.uniocfc.nocache. 1 


11100 


01101 


load . unlock, nocache . 2 


11101 


01110 


load . unlock .nocache . 4 


11110 


01111 


_ load.unlock^nocache.a 


11 in 
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3.1.2 Cacheable Data Read Miss 

In the typical data cache miss, the MEM_CMO(4:0) field is either 00001 , LOAD.NOLOCK.CACHE.16» or 
the field is 00011. LOAO.NOLOCK.CACHE.64. The first command requests a cache fill of 16 bytes. 
The second command requests a cache fill of 64 bytes. This second command is issued only if the 
cache miss is triggered by a 64 bit floating point load at an address boundary that is zero mod 64. 

The address presented with the data is the IP's exact load address. Before forwarding to the X-6us 
address bit 3 must be unconditionally zeroed if a 16 byte fill. Address bits 5. 4 and 3 will naturally be 
zero if a 64 byte fill. This is required by the fill algorithm which is natural order beginning at the nearaar 
lower byte boundary that is 0 modulo the fill size. The address masK bits must be forced to all mea 
before transferring on the X-8us. 

3.1.3 Unencacheable Data Read Miss 

A load may reference data that is nnarked unencacheabie. Load data may be declared unencacha* 
able for one of the following reasons. 

• The PMAPE*s C bit is set in the virtual address mapping tables. 

• The memory reference address is a physical one because virtual translation is not enabled. 

• The memory reference address is a physical one required for an MMU table walk. 

• The memory reference address is a physical one caused by a /oaor.p/iys/ca/ instruction. 

• The CPU's instruction is a /oad./odr, requiring access to the bus. 

• The CPU's instruction is a ioad.un/ock, requiring access to the bus. 

The caching decision is made by the MMU and communicated in the MMU command field. Alt ot the 
renrmining data cache miss codes other than those just mentioned in the last section apply to unerK 
cacheable references. 

In an unencacheable data cache miss, only the requested data is returned. The address presented 
with the MMU command is forwarded as is to the X-8us. and the read mask is appropriately con- 
structed to reflect the request size. If the request is for an 8 byte quantity* a read multlptft of 7 
longwords wil^result. . - 

3.1.4 Load.Lock 

The /oad./odc instruction requires access to the X«-Bus to gain the bus lock. For this reason an unen- 
cacheable data miss is declared by the MMU. When the load.lock's data returns, the bus lock can bo 
assumed to be secured. 

The MMU may issue a second locking read request before a previotisly acquired took is released. The 
MMU may do so whiie processing a secondary TB miss during a locked code sequence. The BIF wiS 
properly nest this second request. 

3.1.5 Load.Unlock 

The /oacf.un/odr instruction requires access to the X-^us to release the bus lock. For this reason an 
unencacheable data miss is declared by the MMU. When the toad.unlock'8 data returns* the bus lock 
can be assumed to be released. 
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This instruction may be issued even when the bus lock is not held. This instruction will not release a 
bus iocK not held by this CPU. 

3.1.6 Data Cache Read Data Return 

Once the data cache -miss read address is transferred across the bus. the BIF awaits read data" re- 
sponse. When the requested data finally returns, it is forwarded to the OATA(63:00) bus. The data is 
then used by the IP. FP or MMU and is optionally stored In the cache. The cache updating is refered to 
as fiiiing. 

3.1.6.1 Data Return Delay 

Normally, returning read data is forwarded to the DATA bus in the cycle immediately following the data 
transfer on the X-Bus. In some cases however, DATA bus forwarding is delayed one additional cycle. 
The cases are summarized. 

• The X-Bus data returns in the same cycle that the £ASRC bus is being used to process- an - 
invalidate. A data cache fill cannot take place in the next cycle because the EA will not 
hold the proper lill address. 

• The X-Bus data returns in a cycle immediately after an instruction cache miss that required 
delayed data forwarding. The immediately abutting X-6us data returns do not afford an 
opportunity to remove the instruction cache miss's delay. The instruction cache fill may 
collide In the use of the PC in the same manner as Just described for EA's use during data- 
cache fill. 

e The data read request was unencacheable. In this case, the possible need to rotate the 
returning read data requires an additional cycle of delay. 

The data return delay is net visible to the MMU In handshake protocol. 

3.1.6.2 Data Return Alignment 

If the data read request is unencacheable. and is for one longword or less, and the longword addresa^ 
is even« the. tetui:ning.^ejad data wili be ^duplicatedr^^ cache data ;busc.«: Thcaii^ 

required by the MMU which can access only 0ATA<31 :00) . In all other cases, the returning data will be 
aligned on the DATA bus as It appears on the X-BUS. 

3.1.6.3 Data Cache Fill Data Sourcing / MEM^RESP 

If the data cache read miss is for a 1 6 or 64 byte fill, the requested data is provided a bytes at a time 
on the X-BUS. The data is then fonArarded 8 bytes at a time to the DATA bus and written simultane* 
ousiy with the data being accepted by the IP or FP. 

The BIF will begin driving returning X-8us data before X-8us Read Response data has arrived. The BIF 
will first drive the bus in the cycle after the data cache miss MEM_CMO has been driven by the MMU. 

Simultaneously with the DATA bus driving, the MEM_RESP(2:0) field is sourced by the MMU. Tygi-- 
caity, code 001 will be-drlven; Codes-tOO and 101 will be driven in the event of bus error. The data 
cache filling is strictty slaved to th X-Bus timing and normally takes place in uninterrupted cycles. 
Se ECCU/ECCC bekiw for the exceptions to this. 



Apollo Confidential November 12, 1986 

- 44 - 



3-4 



EP 0 366 434 A2 



Data Cache Int rfae 



W1EM_RESP(2:01 - Data Cache Miss 



-000 


NOP 


001 


Ocactie Oata Return 










100 


Load eCCU 


101 


Load No Response 


mm:- 


j?vFetCtt:^eeGtl^-^^^ 




;;. Eeetdi^a^tespafssms^ 



3.1.6.4 Data Cache Fill Parity Sourcing 

The returning data parity is regenerated while the data is on the OArTA^bus. if the request was a 16 or 
64 byte fiU. the parity is written into the data cache parity RAM's in the following cycle. Byte parity is 
maintained in the data cache. 



3.1 .6.S Data Cache Fill Address Sourcing / BIF^PAARB BIFJNVOP 

If the data cache read miss is for a 16 or 64 byte fill, the fill index is sourced by the BIF on the PA btis. 
The BIF requests this use of the PA bus one cycle In advance of the address transfer (two cycles in 
advance of the DATA transfer) by asserting the BIF_PAAflB(1 :0) signals. BIF_PAAflB = 01 requests the 
joint use of the PA bus and the EASRC bus in anticipation of data cache fill, if there are simultaneous 
instruction and data cache misses posted. BIF_PAARB « 1 1 will be asserted. This requests both the 
PCSRC and EASRC bus's in case either returris on the bus. 

The BIF will begin requesting the PA bus t)efdre X-Bus Read Response data has arrived. The BIF wHI 
first make an arbitration request on the PAARB signals In the X-6us acknowledge cycle for the miss- 
read address transfer. 



B1F.PAARB[1:0] 



00 


NOP 


■bi 


Arbitrate for PA/EASRC : cache fill or invalidate 


10 




11 


Arbitrate for PA/EA/PCSRC : cache fill or invalidate 



The BIF sources the 13 bit fill index on PA (15:03) one cycle in advance of the OATA transfer. Simulta* 
neousty, the BIF requests the setting of the data cache tag's 8 VAUD bits in that next cycle by deas- 
serting the BIFJNVOP[1:0] signals. BFJNVOP » 00 Implies setting the valid bits. 
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BIF JNVOP[2:0] 



000 


0 


NOP 


001 


1 


RESET VAUD BITS 


010 


2 


Selective TB Invalidate 


011 


3 


Comprehensive TB Invalidate 


100 


4 


FtH 


101 


5 


Diagnostic FHl 


110 


6 


uncSefinBd 


111 


7 


undefined 



3.1.6.6 Data Cache RII: MMU Tracking 

While the BIF sources both the data and fill address, the RAM strobes and tag contents are provided by 
the MMU. The MMU does so in response to the BIF^PAARB and BIF JNVOP signals. The 81F sourcav 
these signals vnthout knowing about return data availability. The BIF informs the MMU that data has.. 
been written only after the fact, by means of the MEM_RESP(2:0) signals- 

Tbe MMU guesses that the fill will be complete next cycle when the final fill entry index is on the PA bus 
and there is no request on the BIF^PAARB signals. If for some reason the fill does not complete in this 
cycle, both the MMU and BIF backup and tr/ again. The MMU recognizes this situation by observing 
that the MEM_RESP field Is 000 (NOP) in the cycle which should have. been the last RAM data writsr. 

3.1.7 Data Cache Read Miss Errors 

Quite a few errors are possible in the course of processing a data cache read miss. They are summa* 
rized in this section. 

3.1.7.1 External Invalidate Collision 

In the interval between the read address transfer on the X-8us and the read data return, a write to the 
returning data from another CPU is possible. The BIF watches for this situation and detects any write-- 
read collision on the same physical page. If a collision is detected, the BIFJNVOP signals are asserted^:- 
rather than deasserted in the cycle before the data cache write. BIFJNVOP » 01 will reset the tag's 8 
valid bits. 



BIFJNVOPI1:01 



00 


NOP 


01 


Reset Oata/lnst Tag Valid Bits 


10 




11 





This write*read collision detection applies only to an external write. A locally generated write wiil only 
be issued on the X-Bus subsequent to a data cache read if the write was generated earlier in time, 
and the write does not conflict in address with the read. 

3.1.7.2 Bus Acquisition TbneotJt 

if the bus acquisttioa timer. elapses-bef ore- the data cache-read-gains-access to the bus, a Ifardware 
failure is presumed. Th 6F requests the clocks to stop and records this error status in scan state. 
The BIF continues to arbitrate for the bus. 
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3.1.7.3 No Acknowledge 

If the data cache miss address transfer results in no bus acknowledge, a software failure is presumed. 
The 6IF records this error status in the BCTRL register and freezes the ERRAODR register. The BJF 
returns a LOAD_NO_RESPONSE code. 101, on the MEM_R5SP(2:0) signals. 

3.1.7.4 Error Acioiowledge 

If the data cache miss address transfer results in an error bus acknowledge, a hardware failure is 
presumed. The BfF records this error status in scan state. The BIF otherwise acts as if it was a busy 
acknowledge to presen/e state. 

3.1.7.5 Read Return Timeout 

if the read return timer elapses before the data cache read data completely returns, a hardware failure 
is presumed. The BiF requests the clocks to stop and records this error In the scan state. The 
continues to await read return data. 

3.1.7.6 ECCU 

A device error may prevent correct data return. The most common such error is a main memory 
ECCU. This same situation will also occur when a secondary bus gets a read timeout. 

When only incorrect data can be returned, a READ RESPONSE ERROR command will be returnedd on 
the X-Bus. The BIF. in turn, will terminate the transfer. The MMU_RESP(2:01 code LOAD ECCU. 100, 
will be sent to the MMU. 

if the READ RESPONSE ERROR occurs as one response in a READ MULTIPUE. no further response data 
wiU be accepted from the X-^BUS. 

3.1.7.7 ECCC 

A correctable data error can occur upon access to main store. If this happens in an unencacheatsto 
reference, it is not visible to the MMU. If this happens in a 16 or 64 byte fill, this may result in the** 
interpositioning of NOP*s within the returning X-BUS read data. When a NOP Interrupts this sequence, 
there will always be at least 2 NOP*s present. 

When the NOP interrupts the fill sequence, incorrect data Is written to the RAM*s. The BIF then baeks: 
up the fill address by eight bytes, awaits the corrected data, and rewrites the RAM location. 

When the NOP arrtvea instead of the last 8 bytes of read return data, there is an additional complication 
in that the BIF may have relinquished control of the PA bus. The MMU will recognize this situation and 
hold tiie processor stall. The BIF rearbltrates for the PA and EASRC buses, then sources the last flH 
address and waits for corrected data. The need to arbitrate, then resupply the former fill address 
requires the two NOP's. 

If a data returning X-Bus sequence is interrupted by NOP's, the responder will assert arb Inhibit to 
prevent another party from gaining access to the bus. In consequence, the BIF does not have to be 
prepared to handle external invalidates or instruction cache read data response during such an Inter* 
ruption. 

3.2 Data Cache invalidates 

Data cache Invalidates may be posted from the BIF to the data cache. The overall sequencing of data 
cache invalidate is described In chapter 5* 
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3.2,1 Data Cache Invalidate Address Sourcing / BIF^PAARB BIFJNVOP 

The BIF provides onty the invalidate index for the cache tocation to be purged. The address is trans* 
f erred over the PA bus. The BiF requests this use of the bus one cycle in advance of the address 
transfer (two cyctes in advance of the tag invalidate) by asserting the BIF_PAARa(1:0) signals* 
6!F_PAARB » 01 requests the joint use of the PA bus and the EASHC bus. BIF^PAARS « 1 1 requests 
theloint use of the PA bus. EASRC bus and PCSRC bus. This code is used if both caches are to be 
invalidated. 



BIF_PAARBI1:01 



00 


NOP 


01 


Arbitrate for PA/EASRC : cache fill or invalidate 


10 


Arbitrate for PA/PCSHC : cache fill or invalidate 


11 


Arbitrate for PA/EA/PCSRC : cache fill or invalidate 



The 13 bit invalidate index will be on PA (15:03) one cycle in advance of the tag RAM write; Stmuitane^ 
ousiy. the BIF requests the dealing of the data cache tag's 8 VAUD bits in that next cycle by assartiRg 
the B1FJNVOP[1:0] signals. BiFJNVOP s 01 will reset the tag's 8 valid bits. 



BIFJNVOPI2:01 



000 


0 


NOP 


001 


1 


RESET VALID BUS 


010 


2 


Selective TB Invalidate 


oil 


3 


Comprehensive TB Invalidate 


100 


4 


FHI 


101 


5 


Diagnostic Fill 


110 


6 


undBfinea 


111 


7 


undefinea 



3.3 Data Cache Writes 

Processor store data Is both written to the data cache and forwarded to the X-Bus. This wnfe thfouglfs^ 
cache strategy requires the BIF to handle processor writes effectively. 

Unlike reads, the CPU does not wait for a write request completion* The BIF simply queues the write 
data and address. This decouples the CPU from X-BUS acquisition latency. 

3.3.1 MMU Request to the BIF 

The write's 30 bit physical address is provided by the MMU on the PA bus. The MMU command 
accompanies the physical address. 

The write's virtual page offset within segment. VPN, bits will be presented advance of the physical 
address and comnnand. Typically, the 7 bits are captured by the BIF from the external EA register 
every cycle. If a write occurs, the physical address and command will then arrive in the following 
cycle. If how6ver7 the PA bus.is^t available in this succeeding cycle, the MMU will assert the signal — 
MMU^HOLOJDVPN. Th BIF will hold the captured data cache VPN. MMU HOLD DVPN will be deas- 
serted in the cycle in which the physical address and command are finally sent xo th BIF. 
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Properly aligned write data will also be presented in advance of the physical address and command. 
Typically, the 64 bits are captured by the B(F from OATA bus directly every cycle. Again, the physical 
address and command will arrive in the following cycle. If however, the PA bus is not available in this 
succeeding cycle or a whtQ buffer full stall is In effect, the MMU wtli deassert the signal 
MMU_HOATA_LD. The BIF will hold the captured data. MMU_HDATA_LO will be reasserted in the cycle 
in which the physical address and command are finally sent to the"*BlF. 

There are quite a few commands that apply to data cache write. They are summarized in the next 
table. 



MEM_CMD[4:0] 



\ 10000 



00000 



00001 



00010 



00011 



} NOP 



10001 



10010 



10011 




store . noiock . cache . 1 



store.nolocK.cache.2 



store .noiock. cache . 4 



.noiock.cache.8 



3.3.2 Cacfteable Data Store 

In the typical data cache store, the MEMjCMO(4:0) field ranges from 10000 to 10011. 
STORE.NOLOCK.CACHE.bytejCQunt. The commands Just indicate the store's request size. 

The address presented with the command la the IP*s exact store addrass. 

Cacheable store data may be combined v^th previously issued cacheable store data to compose 
larger X-8us transactions. This write compact/on is described in chapter 6. 

3.3.3 Unencacheable Data Store 

A store may also be declared unencacheable for one of the following reasons. 

• The PMAPE*s C bit is set in the virtual address mapping tables. 

• The memory reference address is a physical one becatise virtual translation is not enabled* 
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• The memory reference address is a physical one required for an MMU table walk. 

• The CPU*s instruction is a store. unlock, requiring access to the bus. 

The caching decision ts made by the MMU and communicated in the MMU command field. All of the 
remaining data store command codes other than those just mentioned in the last section apply to 
unencacheable references. 

(n an unencacheable data cache store, wrtte compaction is not permitted. The address presented with 
the MMU command is forwarded as is to the X-Bus, and the write mask is appropriately constructed to 
reflect the exact request size. If the request is for an 8 byte quantity, a write multiple of 2 (ongwords 
will result. 



3.3.4 Store.Unlock 

The atOfB.unlock instruction will be handled no differently than any other unencacheable store except 
that the bus kick nr^y be released as a side-effect of the X-Bus request completion. 

The IP wilt assume the bus lock ts released as soon as the write is queued. 

The MMU may issue a second locking read request before a previously acquired lock is released. The 
MMU may do so while processing a secondary TB miss during a locked code sequence. The BIF will 
properly nest this second request and require two store.unlocks before releasing the bus. 

MMU.STORE.UNLOCK differs from other store.untock's in that the write data will always be provided in 
the least significant 32 bits. When the longword store address is even, this requires a special write 
rotatton before the data may be presented to the X-Bus. 

This instruction may be issued even when the bus lock Is not held. This instruction will not release a 
bus lock not held by this CPU. 

3.3.5 Write Buffer Full 

If the 6(F ts unable to accept much more store data, it will assen the signal WBUF^FUUL t>ack to the« 
MMU in order to generate back pressure. The MMU Interprets the assertion of this signal to mean that 
if there is currently a store in its data cache access phase, that store data will tse accepted but the 
address witt not. This will mean that the store must- stall in its eKcepUon phase; 

WBUF^FULL deserves more descriptkin than this. 

3.3.6 Data Cache Write Errors 

The few errors that are possttile in the course of processing a data cache write are summarized in this 
section. 

Because X-Bus writes are one way transfers, device errors such as auxiliary bus timeouts. ECCC's 
and ECCU's must be detected and recorded at the write's destination. 

3.3.6.1 Bus Acquisition Timeout 

If the bus acquisition timer elapses before the data cache write gains access to the bus. a hardwsre 
faiiur is presumed. Th BIF requests the clocks to stop and records this error in scan state. The BIF 
continues to request the bus. 
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3.3.6.2 No Acicnowiedge 

If the data cache write address transfer resutts in no bus acknowledge, a software failure is presumed. 
The 81F records this error status in the BCTRL register and freezes the EflRADOR register. The write 
request is forgonen. 

3.3.6.3 Error Acknowledge 

If the data cache write address transfer results in an error bus acknowledge, a hardware failure is 
presumed. The BIF records the error status in scan state, but othen^se treats the the acknowledge as 
a busy one to preserve state. 

3.4 TB Invalidates 

Transition Buffer Invalidates may be both posted by the MMU for forwarding to the X-Bus. or may be 
relayed from the X-Bus by the BIF to the MMU. The precise sequencing of TB invalidates is described, 
in chapter 5. 

3.4.1 Invalidates from the MMU 

Similar to data cache writes, the CPU does not wait for a TB invalidate completion. The MMU relays 
and the BIF queues the TB invalidate request. 

There are both seiective and comprehensive TB invalidates. There is one MMU CMD(4:0) code for 
each. Code 11000 Is for a selective TB invalidate, and a 20 bit virtual address is'expected to accom- 
pany it. The virtual address will be provided by the MMU on PA(01 :00) || PA(29:12) . The address wtti 
be relayed to the X-Bus where it will appear in the address bit positions 31 through 12. Code 1 1001 
identifies a comprehensive TB Invalidate. No address is required in this case. 

No VPN is associated with a TB invalidate. 

No data is associated with a TB Invalidate. 
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MEM_CNID[4:01 
I 00000 




3.4.2 Invalidates from the MMU: Write Buffer Full 

TB invalidates, both selecth/a and comprehensive will occupy a position in the write queue. Consa* 
quently, they can result in write buffer full stalls, if the BIF Is unable to accept another tb invalidate or 
more store data, the BF will assert the signal WBUF^FULL as descritaed in section 3.3.5. 



3.4.3 tnvatldates from the MlUlU: Bus Errors 

Only two errors are possible in transmitting a TB invalidate on the X-Bus. Failure to secure the bus and 
a parity error upon transmission. 



3.4.3.1 Bus Acquisition Timeout 

If the bus acquisition timer elapaea before the TB invalidate gains access to the bus, a hardware failure 
is presumed. The BIF requests the docks to stop and records this as a write error in the scan state. 
The BIF continues to request the bus. 



3.4.3.2 Error Adcnowiedge 

If the TB invalidate transfer results in an error bus acknowledge, a hardware failure is presumed. The 
BIF records this as a write error in the scan state. The BIF otherwise treats this acknowledge as a busy 
one to preserve state. 



3.4.4 Imraiidates to the MMU 

Inconrting TB invalidates are forwarded by the BIF to the MMU. Th forwarding follows the cache 
invalMate pipeline as described in chapter 5. 
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Both selective and comorenenstve TB invalidates may t38 posted to the MMU, The 81F sources a 20 bit 
vtrtuaJ page numoar on the PA bus if a selective TB invalidate is reauired. If a comprehensive invalidate 
is desired no address is required, but the BIF wiil ar&ttrate for ana secure the PA bus nonetheiess. 

3.4.4.1 External Selective TB Invalidate Address Format 

Inconning TB invalidate addresses are nght shifted before transfer across the PA bus. The virtual page 
numtaer bits 31 througn 12 wiil be aligned on the PA bus in bit positions 22 through 3. 



3.4.4.2 External TB Invalidate Address Sourcing / BIF^PAARB BIFJNVOP 

The BIF uses the BIF_PAARB signals to request the PA bus to transfer the invalidate address. The BIF 
will usually request the use only of PA and EASRC buses. 6IF_PAARB « 01 . If an instruction cache fill IS 
underway at the same time. BIF_PAARB all wiU be driven. The decision as to whether to do an 
instructton cache fill or TB Invalidate can then be deferred one cycle. 



BF_PAARB(1:01 



00 


NOP 


01 


Arbitrate for PA/EASRC : cache fill or invalidate 


10 


Arbitrate for PA/PCSRC : cache fill or invsHdata 


11 


Arbitrate for PA/EA/PCSRC : cache fill or invalidate 



Either a selective TB invalidate or a comprehensive TB invalidate is requested in the same cycle as the 
PA bus use. If selective, the TB invalidate index will be on PA bus. The BIF requests the selective TB 
invalidate by setting BFJNVOP » 10. If a comprehensive TB invalidate is desired, the BIF sets BIFJN- 
VOP « 11. " 



BIFJNVOPI2:01 



000 


0 


NOP 


001 


1 


RESET VAUO BITS 


010 


2 


Selective TB Invalidate 


oil 


3 


Comprehensive TB Invalidate 


100 


4 


Rll 


101 


5 


Diagnostic Fill 


110 6 


undefinea 


111 


7 


ufiMfineti 
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INSTRUCTION CACHE INTERFACE 



4.1 



instruction Cache Read Miss 



Processor instruction fetches are usually satisfied by the Instruction cache. An instruction cache reati 
mrss occurs when the data cache does not presently have the requested instruction. 

In the main, instruction cache read miss processing parallels that of data cache read miss. The major 
differences result from the many fewer request within instruction cache miss. 



4.1.1 MMU Request to ttae BIP 

The fetch's 30 bit physicai address is provided by the MMU on the PA bus. 
accompanies the physical address. 



The MMU command 



The read's virtual page offset within segment. VPN, bits will be presented in advance of the physical 
address and command. Typically, the 7 bits are captured by the BIF from the external PC register 
eveiy cycle. If an instruction cache miss occurs, the earliest the physical address and command wilt 
arrive is the foilowing cycle. If however* the PA bus is not used or is otherwise unavailable in this 
succeeding cycle, the MMU will assert the signal MMU^HOLOJVPN. The SIF will hold the captured 
Instruction cache VPN. MMU.HOUOJVPN win be deassened In the cycle In which the physicai address 
and command are finally sent to the BIF. 

There is only one command that applies to instruction cache miss. 



ME1VI.CMDC4:0] 



00000 



01101 



01110 



01111 



I NOP 



00001 






10001 


■11® 


00010 


fetch.noloclc.cache.32 


1 10010 




00011 






40011 




00100 






10100 1^ 


00101 






10101 




00110 


e. *: -:V.cV:<•r:■^:^:^:'<:r•^<:;<y^ 




10110 ig| 


00111 






10111 


wmm. 


01000 




13 


11000 




01001 


■ ...V ;.• • - • v V;-':rr?';;5tw?KrJ!?^ 
i^. :. .:<■:■ r ■' '■ 




11001 mm 


01010 


;r ^y~''y^ '-' "^^^^ : ■ 




11010 


01011 






11011 WM 


01100 






11100 
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All instnjction cache misses are cacneable ana 32 bytes in length. 

The address presented with the command Is the IP's exact fetch address. Before forwarding to the 
X-Bus address bits 3 and 4 must be unconditionally zeroed. This is required by the fill algorithm which 
is natural order beginning at the nearest lower byte boundary that is 0 modulo 32. The aodress masK 
bits must be forced to all ones before transferring on the X-Bus. 

4.1.2 Instruction Cache Read Data Return 

Once the instruction cache miss read address is transferred across the X-Bus. the BIF awaits read 
data response. When the requested data finally returns, it is forwarded to the 1NST(63:00) bus. The 
instruction is then stored in the cache. 

4.1.2.1 Instruction Return Delay 

Normally, returning memory data ts fopA^arded to the INST bus in the cycle Immediately following the 
data transfer on the X-Bus. In some cases however. INST bus forwarding is delayed one additional 
cycle. The cases are summarized. 

• The X-8us data returns in the same cycle that the PCSRC bus is being used to process an 
invalidate. An instruction cache fill cannot take place in the next cycle because the PC will 
not hold the proper fill address. 

• The X-Bus data returns in a cycle immediately after a data cache miss that required an 
insertion delay. The immediately abutting data and instruction fill data responses on the 
X-aus does not afford an opportunity to remove the data cache miss's delay. 

The data retum delay is not visible to the MMU in handshake protocol. 

4.1.2.2 Instruction Return Alignment 

The instruction data Is always aligned on the INST bus as it appears on the X-Bus. 

4.1. 2.3 Instruction Cache Fffl Data Sourcing / MEM^RESP 

The instruction cache data is provided 8 bytes at a time on the X-Bus and is forwarded to the INST bus 
a bytes at a time. The instruction cache filling is strictly slaved to the X-Bus timing and normally takes 
place in uninterrupted cycles. See ECCU/ECCC below for the exceptions to this. 

The BIF will begin driving returning X-Bus data before X-Bus Read Response data has arrived. The BIF 
wiU first drive the INST bus in the cycle after the Instruction cache miss MEfi^ CMD has been driven by 
the MMU. ^ " 

Simultaneously with the INST bus driving, the MEM RESP(2:0) field Is sourced by the MMU. Typkally, 
code 010 will be driven. Codes 110 and 1 1 1 will be driven in the event of bus enxjr. The instruction 
Mche ftMIIng is s^^ slaved to the X-Bus timing and normally takes place in uninterrupted cyciea. 
Se ECCU/ECCC below for the exceptions to this. 
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MEM_RESP(2:0] - Data Cache Miss 



000 


NOP 






010 


Icacne Oata Return 














110 


Fetch eCCU 


111 


Fectch No ResDonse 



4.1.2.4 Instruction Cache RH Parity Sourcing 

The returning instnjction parity is regenerated while the data is an the INST bus. it is written into the 
instruction cache parity RAM's in the following cycle. One bit of parity is maintained over all even 
instruction bytes, and one over ait odd instruction bytes. 

4.1.2.5 instruction Cache Rll Address Sourcing / BIF.PAARB BIFJNVOP 

The instruction cache fill index is sourced by the 6F on the PA bus. The BIF requests this use of the PA 
bus one cyde in advance of the address transfer (two cycles in advance of the iNST transfer) by 
asserting the eiF.PAARBd :0) signals. BIF.PAARB » 10 requests the joint use of the PA bus and the 
PCSRC bus. aiF^AARB « 1 1 requests the use of the EASRC bus in addition. This last code would be 
used if instruction cache miss and data cache miss are concurrently underway on the X-Bus. 

The BIF will begin requesting the PA bus before X-Bus Read Response data has arrived. The BIF will 
first maice an arbitration request on the PAARB signals In the X-Bus acknowledge cycle for the instruc- 
tion miss read address transfer. 



BIF_PAARBC1:01 



00 


NOP 






10 


Arbitrate for PA/PCSRC : cache fill or invalidate 


11 


Arbitrate for PA/EA/PCSRC : cache fill or invalidate 



The BIF sources the 14 bit fill index on PA (29: 16) one cycle in advance of the INST transfer. Simulta- 
neously » the BIF requests the setting of the Instruction cache tag's VAUD bit In that next cycle by 
deaaserting the BIFJNVOP signals. 



BIFJNVOPlliO] 



GO 


NOP 


01 




10 




11 
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4.1.2.6 Instruction Cache Fill: MMU Tracking 

While the SiF sources both the data ana fill address, the RAM strooes and tag contents are provided by 
the MMU. The MMU does so in resoonse to the BIF^PAARB ana 8IFJNV0P signals. The 61F sources 
these signals without Knowing about return data avatiability. The 61F informs tha MMU that data has 
been wnnen only after the fact, by means of the M£M_RESP(2:0) signals. 

The MMU guesses that the fill will complete the next cycle when the final fill entry index is on the PA bus 
and there is no request on the 61F_PAARB signals. If for some reason the fill does not complete in this 
cycle, both the MMU and BIF backup and try again. The MMU recognizes this situation by obsen^ng 
that the MEM_RESP field is 000 (NOP) in the cycle which should have been the last RAM data write. 

4.1.3 Instruction Stream Writes 

No attempt is made in hardware to Interlock stores with instruction stream reads. If a program wishes 
to update the Instruction stream it must follow this sequence. 

• €xecute the store. 

• Execute a load.uniock. This assures that the store has been accomplished on the X-Bus. 

• Walt for the invalidate pipeline to empty (5 instructions). 

• Fetch the instruction. 



4.1.4 Instruction Cache Read Miss Errors 

The errors that are possible in the course of processing an instruction cache read miss are summa- 
rized in this section. 

4.1.4.1 External Invalidate Collision 

In the interval between the read address transfer on the X-Bus and the read data return, a write to tt^e* 
returning data from another CPU Is possible. The BIF watches for this situation and detects any write— 
read collision on the same physical page, if a collision is detected, the BIFJNVOP(1:0] signals are 
asserted rather than deasserted in the cycle before the instruction cache write. BIFJNVOP = 01 wiH 
reset the tag's valid bit. 

This potential cache invalidation wUi also apply to locaiiy generated writes. 



BIFJI^VOP(1:01 



00 


NOP 


01 


Invalidate instruction/Data Cache 


10 




11 





4.1.4.2 Bus Acquisition Timeout 

If the bus acquisition timer elapses before the instruction cache read gains access to the bus. a 
hardware failure is presumed. The BIF requests the clocks to stop and records this error status in the 
scan state. The BIF continues to arbitrate for the bus. 
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4.1.4.3 No Acknowledge 

if the instruction cache miss address transfer results in no bus acknowledge, a software failure is 
presumed. The 61F records this error status in the BCTRL register and freezes the ERRAOOR register. 
The BIF returns a FETCH_NO_RESPONSE code. Ill, on the MEM_RESP(2:0) signals. 

Any instruction fetch from a memory region that cannot suppon an X-8us READ MULTIPLE will result in 
this error. An attempt to fetch from UTIUTY board RAM will result in this error, 

4.1.4.4 Error Acknowledge 

If the instruction cache miss address transfer results in an error bus acknowledge, a hardware failure is 
presumed. The BIF records this error status in the scan state. The BIF otherwise treats this acknowft* 
edge as a busy one in order to preserve state, it's expected that the source of the acknowledge will 
request a clock freeze. 

4.1.4.5 Read Return Timeout 

)f the read return timer elapses before the instruction cache read data completely returns, a hardware 
failure is presumed. The BIF records this error status in the scan state. The BiF continues to await 
read data retum. 

4.1.4.6 ECCU 

A device error may prevem correct data retum. The most common such error is a main memory 
ECCU. 

When only incorrect X-Bus data can be returned, a READ RESPONSE ERROR command will be re- 
turned on the X-Bus. The BIF will terminate the transfer. The MMU_RESP(2:0) code FETCH ECCU. 
1 1 0. will be sent to the MMU. No f unher response dam for the READ MULTIPLE will be accepted from 

the x-aus. 

4.1.4.7 ECCC 

A correctable data error can occur upon access to main store. If this happens in an instruction cache 
fill, this may result In the Interpositioning of NOP's within the returning X-BUS read data. When a NOP 
interrupts this sequence, there will alwaya be at least 2 NOP*s present. 

When the NOP Interrupts the fill sequence, incorrect data is written to the RAM's. The BtF then baete 
up the fill address by eight bytes, awaits the corrected data, and rewrites the RAM location. 

When the NOP arrives instead of the last 8 bytes of read response data, there is an additional compli- 
cation in that the BIF may have reiinqutahed control of the PA bus. The MMU will recognize this 
situation and hold the processor stall. The BiF rearbitrates for the PA and PCSRC buses, then sources 
the last fill address and waits for corrected data. The need to arbitrate, then resuppty the former fill 
address requires the two NOP'a. 

If a data returning X-Bus sequence is interrupted by NOP's. the responder will assert art) inhibit to 
prevent another party from gaining access to the bus. In consequence, the BIF does not have to be 
prepared to handle external invalidates or data read data response during such an interruption. 

4.2 lnstniGtion.Cache Invalidates 

instruction cache Invalidates may be posted from the BIF to the instruction cache. The overall se- 
quencing of instruction cache invalidate is descrit>ed in chapter 5. 
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4.2.1 Instruction Cache invalidate Address Sourcing / BIF^PAARB BIFJNVOP 

The BIF provides onty the invatidate index for the cache location to be purged. The address^ is trans- 
ferred over the PA bus. The BIF reauests this use of the bus one cycle in advance of the address 
transfer (two cycles in advance of the tag invalidate) by asserting the BIF_PAARB(1:0) signals. 
B1FJ3AARB s 10 requests the joint use of the PA bus and the PCSRC bus. BIF_PAARB « 1 1 requests 
the joint use of the PA bus, EASRC bus and PCSRC bus. This cooe Is used if both caches are to be 
invalidated. 



BIF,PAARB(1:01 



00 


NOP 


01 




10 


Arbitrate for PA/PCSRC : cache fill or invaUdate 


11 


Arbttrata for PA/EA/PCSRC ; cache fill or invaUdate 



The 14 bit Invatidate index will be on PA(29:1 6) one cycle in advance of the tag RAM write. Sinnultane* 
ously. the BIF requests the dealing of the instruction cache tag's VALID bit in that next cycle by 
asserting the BIFJNVOP signals. BIFJNVOP « 01 will reset the tag's valid bit. 



BIFJNVOP[1:0] 



00 


NOP 


01 


Invalidate Instructton/Oata Cache 


10 




11 
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CHAPTERS 



INVAUDATE PIPELINE 



5.1 



Duplicate Tag Stores 



The Duplicate Tag Store (DTS) is a copy of the CPU's instruction and Operand Cache Tag Store used 
to connpare addresses being modified on the X-6US against the contents of the caches. If a match 
between a iocation being modified on the X-BUS and OTS entry is found then that entry is invalidated in 
the corresponding cache. Performing this operation without the OTS wouid mean wasting many cydes 
in the caches to compare the cache tags against X-8US memor/ modify transactions. 



The duplicate instruction tag- store is known as D/TS. 
known as 0075. 



The duplicate date or operand tag store is 



5.1.1 DTS Addressing 

The OTS are addressed as are the principal caches with virtual addresses. The X-8US deals only with 
physical addresses so that the virtual address of a transaction is formed by useing the 12 LSB's of the 
physical address which are the same as the 12 LS8*s of the virtual address and concatenating them 
with enough of the virtual address to index the cache. In the case of the CPU's l28kB instruction 
cache 5 virtual bits are required. In the case of the CPU's 64k8 data cache 4 virtual bits are required. 
These bits accompany the physical address on the X-BUS. 



DUPUCATE TAG STORE INDEX 



r 



BYTE ADDRESS WITHIN A PAGE 



/ \ 

[^^^is^tjyia|^i2^^ 1 I 0 

N A . f 



VIRTUAL ADDRESS PHYSICAL ADDRESS BYTE SELECT 
rx (VPN) (NOT USED TO 

INDEX DTS) 



DUPUCATE TAG STORE ADDRESSING. Bits 16 through 3 are used fo address the Duplicate Tag Store. 
Bits 16 through 12 arm taken from the VPN of the X-BUS trartsaction and bits 11 through 3 are taken 
front the Phyatoai Addrsss. One /ess bit Is ret^ired to address the Duplicate Operand Cache Store than 
the Duplicate instruction Cache Store. Only 13 bits are used to address the DOTS, bit 16 Is tied to a 
fixed value. 

Orrs arul DOTS are connwranly addressed. 



5.1.2 



DTS Contents 



Each DTS' entry contains two fields: 
o 18 bit physical tag 
o 1 bit parity check bit 
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5-2 



Invalidate PIpeiin 



The physicat tag is the ^B bit physical oage numCser which along with a 12 bit byte index addresses 1 
gigabyte (30 bits) of physical address space. 

The panty bit is an odd parity checK bit so that the sum of all the bits which are set in the pnysicai tag* 
the valid entry bit ano the parity bit will be ood. 

There is no explicit valid bit. In invalid entry will simply point to an unlikely mennory location. 0. 
Exampie: 

physicat tag « OOOOQQOQOOOOQOOOOO 
panty bit = 1 



17 16 I 15 I 14 



PHYSICAL PAGE NUMBER 

PARITY CHECK BIT 



DUPUGATE TAG STORB CONTBNTS. The Duplicate Tag Stores contain anl8 bit physicai page mmber 
and a Parity Chadc Bit. 



5.2 



DTS Functional Overview 



Duplicate Tag store operations can be divided into the following catagorles; 

o OTS lookup 
o OTS hit 

o OTS allocate from processor write 
o OTS allocate from read response 

The OTS acts as an imperfect filter for cache invalidates. Any time some other system device (tnctud« 
ing another CPU) modifiee a memory location the DTS is checked to see if that location is currently 
resident In either of the CPU*s caches. If it is present then a cache cycle Is stolen from the cache that 
contains that location and the entry In the cache as well as the entry in the OTS is tnvatidated. The OTS 
may actually have labeled as valid entries which are not valid in the caches. The only effect this will 
have is to generate a needless cache invalidate cycle. 

The OTS is updated in two separate situations just as the main caches are. The first is when the CPU 
modifies a tocation by executing a STORS operation. The second Is when a cache miss is generated 
and the data returns on the X-BUS. 
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DTS INVAUDATE QUEUE 



BASIC DUPUCA7E TAG STORS DATAPATHS. r"~l mna Mieate off-^ip totfe 



5.3 DTS Lookup 

A joint lookup of the OITS and DOTS is performed whenever the following transactions are detected on 
the X-BUS: 

o WRITE from another device 

o WRITE MULT fottowed by WRITE DATA from another device 

A lookup only of the DITS is performed whenever the following transactions are detected on the X-> 

BUS; 

Q WRITE from this cpu 

o WRITE MULT foUowed by WRITE DATA from this cpu 

The DTS lookup is basically handled in three pipeline stages. The stages are staved to the operation of 
the X-BUS. 

• COMMAND DECODE 

• DTS ACCESS 

• TAG COMPARE 
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5-4 Invatidat Pipeiin 



5.3.1 DTS Lookup: Write 

in the first cycle after the X*8US bus wnte transactton. the CMO field is decoded. If a WRfTE operation 
is decoded then the address to be used as a OTS index is loaded into the DTS INDEX register. The 
following cycle the OITS is accessed in a read operation and the DOTS is optronatly accessed. The 
tags are compared as required to the physical page number, if the PPM and DTS tag match, a cache 
entry invalidate and a OTS entry tnwalidate are scheduled* 




OTS LOOKUP PIPEUNE SCHEDULE for WRITE or WRfTE UNLOCK 

CYCLE 1 A WRCTE transaGtion on bus. 

The transaction is ioaaed into the BiF's X^&US Input registers. 

CYCLE 2 The command is decoded. 

If ft is a WRfTE the OTS fndex rag is loaded from the pAysteai address and the VPN. 
The physioai address is piped forward for the tag Gomfuue(s), 

CYCt£ 3 A DTS read access taAres place, the tag is compared to the physical address. 
If a matcft occurs a cacfte entiy invaiidate and a DTS entry invalidate 
are scheduled. 



S.3.2 DTS Looioip: Write Multiple 

If the command is decoded and determined to be a WRrtE MULTIPLE transaction then the address is 
stored in the DTS index. During the foUowing cycle when the corresponding WRffE MULTIPLE DATA is 
decoded the first lookiip is optfonaify done if the WRTTE MULTIPLE began on an odd longword bound- 
ary. Otherwise, the address is held in the OTSiNDEX. Thereafter, the DTSINOEX is loaded with its 
former contents plus or minus 8 bytes, depending on whether the WRrrE MULTIPLE was ascending or 
descending, in anticipation of the next WRfTE MULTIPLE DATA cycle. 
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X-8US 


WM 


WO. 


WO. 




CMD OCO 




WM 


WO^ 


wo 2 




DTS 






i wo, 

1 0 


wo. 


wo^l 



DTS LOOKUP PfPeUNe SCHEDULE for WRITE MULTIPLE with TWO DATA TRANSFER CYCLES 

CYCLE 1 A WRITE MULTIPLE (WM) transaction on bus. 

The transaction is ioaded into the BIF's X-6US input registers. 



CYCLE 2 The commana Is decoded. 

If it is a WRfTE MULVPLE the address formed to index the DTS is 
loaded into the OTSINDEX register. 

At this time the first quadword of the WRITE MULTIPLE DATA is on the 
X-'BUS(WDO. 



CYCLES WRfTE MULTIPLE DATA is deooded and the address in the OTSiNDEX is 
optlonaiiy incremented or deoremented by 4 bytes. 
The optlonei odd longwtmi, WDo, iooltup occurs. 

// e match occurs schedule cache entry invalidate and DTS entry Inyelidate 

CYCL£ 4 A DTS read access takes place for WO^, the tag is compared to the 
phyaicai address* 

If a match occurs schedtJie cache entry invalidate and DTS entry Invalidate 



CYCt£ S A DTS read access takes place for WOz, the tag is compared to the 
ptsysical scftfrass. 

// a mate/? occtiis schedule cache entry invalidate and DTS entry irnralldate 



5.3^ DTS Looioip Hit Processing 

When a memory modify operation by another device causes a hit in either OTS, or a tocaiiy generated 
write hits in the DTTS, two events are scheduled. The first is an invalidate of the entry or entries which 
caused the hit In the main cache and the second is an invalidate of that entry or entrtea m the DTS ih 
order to maintain the DTS consistant with the main caches. 

It usuaiiy takes six cycles for a WRITE modifying a memory location which is also in the local caches to 
proceed from the X--BUS to that emry being invalidated. 



o transaction on X-BUS 

o command decoded 

o OTS accessed 

o PA bus arbitration 

o PA BUS/EASRC/PCSRC transfer 

o cache tag writ <s) 



Apoll Confidential ** 

- 64 - 



Nov mber 9, 1988 



EP 0 366 434 A2 

S-6 Invaiidat Pipeline 



The OTS entry invalidate ts placed in a queue awaiting a free OTS cycte. 

Once a m has been detected, the hitting index is ioaded into the address register of the cache corre- 
sponotng to the OTS in which it has hit. The cycle after the OTS lookup is used to complete the 
address compare and request use of the PA bus the fcilowtng cycle. The PA bus wUi always be 
available except when the OTS invalidate pipeline is pre-empted by a HEAD RESPONSE operation fiiOng 
a cache miss (discussed later). The cycte following PA arbitration the index is driven off the BIP ad- 
dress Chip and the drivers to either the PCSRC bus or the EASRC bus or both are enabled by the MMU. 
An index hitting in the OPTS malies it's way to the PC register wNle one hitting in the DOTS must be 
loaded into the EA register. An index hitting in both the ori^ and DOTS will be loaded into both EA and 
PC registers. 



CPU BOARD MSI LOGIC BIEiADDBESSiCHIp:^* 




CACHE iNVAUDATE DATAPATHS (not aU dus SOUfCBS arm shown) 



Nov mber 9, 1986 ApoUo Confidential 

- 65 - 



EP 0 366 434 A2 



invalidate Pfpelin 5-7 



1 2 3 4 5 6 




DTS HIT WITH CACHE ENTRY INVAUDATE and OELAYeO DTS ENTRY INVALIDATE, 

CYCLE 1 A WRfTE (W) transaction on t>u9. 

77)0 transaction is /oaddd into tha BIF's X^BUS input registers. 

The comrriBntj is aecoded. 

The physicai address is piped forward for the tag compare. 
The virtual index is loaded into the DTS index register. 

A read operation is performed on the DTS, 

The results of the tag compare are avaiiable. 
Since there was a hit the PASRC bus is requested. 
The DTS entry invaHdate(a) are queued for execution when DTS is avaliaPie. 

The virtual index of the iocation to de invalidated is passed vie the 
PASRC bus to the epimprlatB cache address register. 

The cache entry causing the DTS hit Is invalldBtea. 
5.4 DTS Allocate from Processor Writes 

When tne CPU modifies an operand cache location via a store instruction the DOTS must also be 
updated to refiect the cache's new state. The update occurs after the transaction is placed on the 
X-BUS. This a>foids OTS conflicts by using the X-BUS as a synchronization point for OTS access. Only 
one device can use the X«8US at a time and that device had to arbitrate to obtain the bus. The only 
OTS operations which are not synchronized through the X-BUS are the OTS entry invalidates and those 
are tower priority than the rest. 

5.4.1 DTS Allocate: Write 

When the BIF address chip decodes a WRITE operation on the X-BUS that ft has generated the follow- 
ing cycles it will write the new tag into the DOTS white doing^a loolcup into the OITS. The OlTS-lookup^ 
procedure harbeerr previously described. A hit occurs in the orrs at this point means that the procea* 
sor is modifying a location that has been cached ^ the instruction cache. An instruction each entry 
invalidate and a ortS entry invaUdat are scheduled. 



CYCLE 2 

CYCLES 
CYCLE 4 

CYCLES 
CYCLE 6 
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While the DTS write ailocate is occumng the DTS index must be camparea against all the indices in the 
OTS entry invalidate queue that are scheduled to invalidate an entry in the DOTS. If any of the com- 
pares succeed then that OTS entry invalidate must Itself be invalidated, if the invalidate was sched- 
uled for both the OITS and DOTS then it Is retagged as being only for the OITS. In this way an old 
pending DOTS entry invalidate won't destroy a recently allocated entry. 

5.4.2 DTS Allocate: Write Multiple 

A WRITE MULTIPLE from the CPU will be treated just like a WRITE MULTIPLE from another device with 
the only difference being that the OOTS is written into with the physical tag rather than read and 
checiced for tag match. 

1 2 3 

X*BUS 
CMO DCO 
OTS 

OTS ALLOCATB from PROCESSOR WRITE 

CYCLE 1 Processor write is p/aced on X^BUS from WRITE BUFFER. 
CYCLE 2 The write Is decoded ertd elso determined to be from the same CPU. 

4 

CYCLE 3 The OOTS Is updated with the new physicaf tag and the vofld bit set 
The OITS Is checked for a tag compare and If a hit occurs the 
tnstrucdon cache entry invalidate and OITS entry invaiidate are 
soheduied in the usuai way. 
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5.5 DTS Allocate from Read Response 

The OTS ts also written UDon the return of a READ RESPONSE in reply to a READ MULTIPLE made by the 
same CPU. When a cacnead/e miss occurs in a cache a READ MULTIPLE request is sent to main 
memory. Main memory returns the requested data in the form of sccessive READ RESPONSE'S. 
Upon decoding the expected READ RESPONSE command the 6IF sends the associated tag to the 
awaiting cache and enters the tag into the OTS using the conventional DTS pipeline. No tag compari- 
son is perfornrted during this OTS cycle and only the OTS corresponding to the cache that missed is 
updated. 




OTS /NOEX INCREMENTiDECREMENT DATAPATHS. \ i areas indicate oif-^Np (ogiG. 



Three sets of addraaaes must be stored and manipulated in addressing the OTS. The OTS index 
register already mentioned used in processing WRPTE MULTIPLES, and two registers to hold the ad* 
dresses associated with two possible pending cache misa READ RESPONSES 
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12 3 N-1 N N+1 N+2 N+3 




READ MULVPLE REQUEST and READ RESPONSE SCENARiO with OTS UPDATE, 

CYCLE 1 A cBGhe miss causes the 8IF to piaco a READ MULTIPLE request on 
thB X-8US. 

CYCLE 2 Th9 eammana Is decodea anet is detemined to te a self ganeratBd 
READ MULVPLE. 

The VPN and physicai address are stored in ttie appropriate pending 
operation holding register depending on the X-8US SUBID signaling 
whether it is an instruction or operand cache miss, 

CYCLE 3...W-r The memory sut^ystem is processing the READ MULTIPLE. 

CYCL£ N The memory subsystem places the first of two READ RESPONSE 
transactions on the X-SUS. 

CVC^ W+l The second READ RESPONSE Is on the X-SUS- 

The first READ RESPONSE is decoded and the corresponding address 

is loaded from the holding register to the DTS index. The holding 

register is then loaded with It's contents d= B bytes depending on the ordering 

for thai type of operation. (I^miss or D^iss), 

CYCLE A/+2 The first REM) RESPONSE is updating the DTS. . 

The second READ RESPONSE is decoded and the contents of the 
holding register are again transferred to the DTS index register and the 
^holding retfister is ste^?PB(i^±, 8^ byiM)t** 

CYCLE /V+3 The secoM READ RESPONSE updates the DTS. 
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CHAPTER 6 WRITE PIPEUNE 



6.1 Write Buffer Overview 

The purpose of the write buffer is twofold. Firstly* it isolates the processor from memory and bus 
(atencies during stores and secondly* it reduces overaU bus traffic. 

The wnte buffer isolates the processor from memory and bus latencies by offering a high bandwidth 
ftfo queue for store operations. The processor can submit many bacl(*to-bacK stores and continue 
functioning while this queue is emptied through the X-6us into memory as both become avaiiabte. 

The write buffer serves to reduce bus traffic by collapsing and grouping small adjacent writes into targe 
single blocks which make better use of the X-6us and main memory resources. 



CACHE 




PROCESSOR 



The WRITE BUFFSR acts as a collapsing fifo queue for stores from the processor to the X-BUS. 



6,1.1 RFO Organization 

The write buffer is physically spilt across the C8A and C80 gate arrays. The C8A holds the address 
portion of the queue and the CBD holds the associated data. There Is 64 bits of data associated with 
every queue address. 

The queue is structtired as a variable depth FIFO. Entries are added to the bottom of the queue and 
removed from the rop. The top of the queue is always at a fixed point. The bottom of the queue varies 
depending on the current number of queue entries. 

There are address comparators at every queue entry. These comparators are used to decide whether 
newly arriving write data may be merged with the current queue contents. This write compaction 
reduces bus and memc^ bandwidth requirements. The address comparator is also used to permit 
reads to bypass writes. The address comparators indicate arty read/write address collisions that would 
forbid the bypass. 
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6-2 Write Pipelin 




77)e wnrTE BUFFER pipeline showing data and addresses flowing from tfie processor to the X^BUS 
sometimes 6/ way of a fifo queue. The actuai numtier of stages in the queue is yet to tie determined. 

Queue entries are actually not unloaded until a successful X-6us acknowledge is seen. Transmit 
Bypass is used when a second or suceaasive X->Bus write is initiated before that first acknowledge. 
Transnnit Ksypass picks the first untransmitted queue entry as the next address or data to send. The 
transmit bypass is not shown in the figure. 



6.2 Write Address/Oata Staging 

The proisessor.8toce.^ta is captntred from the cache DATA bus during the store's access stage. 
Typically, the address will follow in the next cycle on the PA bus. If the PA bus is not avatiable in that 
cycle, or there is a processor EVAUD stall in effect, the data is held in place by the MMU deasserting 
the MMU.HOATA_LO signal. 

As in the prior figure, there are two Inbound data staging registers and one address staging register 
before the write queue proper. One data staging register is to compensate for the early data arrival. 
The second, and the address staging register, are to allow the address comparisons to take place and 
control the load enables in the queue. The address comparisons condition whether the store data may 
be merged with data aiready present. 

6.3 Write Queue Contents 

in addition to holding the data, each C8D data queue has a MSHALF_VAUD and I^HALF.VAUD flags. 
The valid bits are used to determine whether there are any contentsTn the entry. LSHALFJ/ALID and 
MSHAtFJ/AUD are also used to control the output write rotation needed for a 32 bit or smaller write to 
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an even iongwora address. There is a NOSWAP flag wnich defeats that output wnte rotation in the 
case of the MMU which alreaay rotates the oata orooerty. If MSHALF^VAUO and LSHALF^VAUO are 
both valid, dunng the aadress phase of a write multiple transfer, a "2" is sourced with correct parity 

MS VAUD LS VAUD NO_SV/AP 



0 0- EMPTY 

1 0 0 EVEN LONG 

1 0 1 EVEN LONG - MMU 

0 1 - ODD LONG 

1 1 - QUAD 



In addition to holding the address, the CBA address queue holds 4 BYTE_VAUO bits in addition to the 
MSHALFJ/AUD and LSHALF_VAUD flags. LSHALF_VAUD is aimost address bit 2. and the four byte 
vaUd bits correspond to the 4 bit byte mask required for a 32 bit bus write. The CBA sources these 
onto the X->Bus during the address phase of a write or write multiple. There is no need for the' 
NO^SWAP bit. 

MS_VAUD LS^VAUD BYTE^VALID 



0 0 EMPTY 

1 0 BBSS EVEN LONG 

0 1 BBBB ODD LONG 

1 1 QUAD 



The CBA iC also has other flags that control internal arbitration and write compaction. There are 
NOCACHE, UNLOCK, INVTLBALL. and INVTLBE flags associated with each address. Any of these flags 
being set inhibits write compaction and read around write. UNLOCK also releases the bus lock if the 
nesting level is 0 and this CBA holds the lock. The invalidate TB flags force the selection of the IB 
Invalidate bus command. 

6.4 Write GU4eue Loading 

Unless the queue is full, processor stores are accepted and added to the queued data without stalling 
the CPU. Typically, the store's data and address are added simultaneously to the bottom of the 
address and data queues. The position of the queue's bottom is detemnined by the first queue entry 
which is empty, measured from the queue's top. The affiliated flags are set. 

6.4.1 Load A/lerge 

If cacheable store data Is being added to the queue, and the last valid entry in the queue Is also 
cacheabte and agrees in the quadword address, the load data may be merged into that entry. The 
merging would logically OR the valid bits. The merging can always happen if the data to load Is a 
longword or quadword quantity. The merging may be permitted if the data to load is a byte or word in 
length. The merging will be allowed if the queue entiy is already a quadword. or if the merge result wiU 
not spill over into the second longword. 

6.4.2 Write Buffer Full 

When the last entry in the write queue is occupied, and the inbound data address register is occupied 
or about to be (MEMjCMD is requesting the us the signal WBUF.FULL is sent to the MMU to prevent 
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any further stores from advancing. If there is a store currently in its cache access stage cache, that 
store's data will be captured and held, but will freeze in its EXC stage. 

The signal WBUF^FULL will be deassened when the wnte queue next advances. Note that there's 
some magic here when a store is stucK with its M&M_CMO asserted. 



6.5 Write Queue Unloading 

The queue entries are not unloaded until the cycle after receiving a successful acknowledge for the 
address or data transfer on the X-Bus. If retry is required, the address/data is then still available in the 
write queue. 

Write addresses are always taken from the write address queue. Only reads will use the fast pass 
address paths from the MMU. The fast pass paths are for quick posting of read miss addresses In the 
event of default bus ownership. 



6.5.1 TransRitt Bypass 

The address or data to send on the X-Bus is normally at the top of the queue. If however, the top of 
queue has been transmitted but not acknowledged, the next to top of queue would be used. During 
wnte multiples, queue data is being transmitted every cycle. Since the queue must be accessed the 
cycle before the X-Bus transmission, and the queue unload occurs in the third cycle after the X-6us 
transmission. 4 levela of transmit data bypassing Is required! The four levels of bypassing allow reach- 
ing back to the fifth queue entry from the top. This ia illustrated In the next figure. 



ACCESS 


01 


02 1 


03 


04 1 


05 




TRANSMrr 




D1 


D2 


03 


04 




PEND 






01 


02 


03 




ACK 








01 


02 




UNLOAD 










D1 





TRANSMIT 

BYPASSfNO 



An additional level of transmit bypassing is provided in the address queue output delivery. This allows 
a level of address /ooir-a--/?0ad that permits an early detection of write multiples. The write multiple 
gets ahead when the first X-Bus cycle transmits only an address* no data. This one cycle gap is 
enough to let- the.addfese»«anamlt bypass sneak ahead of the data by one cyc:!e. 

Transmit bypass requires a sent flag be associated with the top 3 data and top 4 address queue 
entries. A queue entry is bypassed if it is already sent, or the queue element in front of it Is already 
sent and there is a transfer on the bus now. 



6.5.2 Transmit Retry ^ 

If a data or address X-Bus transfer receives an error or busy acknowledge, alt queue element senr bits 
are reset. The requests are retried. The REJECT signal may also be asserted. 

6.5.3 Write Multiple Coliapse 

If the next address to send is for a quadword. a WRITE MULTIPLE command is sent. NAmile the address 
is being transmitted n the X-aus. the next queue addressed Is checked to see if it's also a quadword 
and in an adjacent quadword. 
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Writ Pipelin 



6-5 



The aajacency direction is suggestsa by the wnte queue oy examining the lower order oits of the nexc 
two addresses to transmit. 

Write multiolBS are arbitrarily broken up on 256 byte boundaries to prevent bus hogging. 



if an instruction cache read is posted, the read is free to pass around previously queued up writes. 

If a data cache read is posted, the read is free to pass around previously queued up writes only if the 
address doesn't collide with a pending write. The write queue detects this address collision and 
repons it to the intemal BIF arbitration logic. The read bypass is inhibited if there are read and write 
side-effects as well, see chapter 2. 

6,7 Write Parity 

Parity for both address and data is regenerated just before X-8us transmission. 



8.6 



Read Around Write 
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CHAPTER 7 REGISTERS 



7.1 Interrupt Posting 

There are 18 interrupt posting (ongword addresses to which the 61F responds as a destination. The 
addresses are in subsequent longwords. 

INTERRUPT POSTING ADDRESS OQpp 0100 to OOpp 013C 

3] ^ 00 



DATA NOT INTERPRETED WRtTE ONLY 
PP n PROCESSOR SELECT NUMBER 

00. 04. oa. oc 10. 14. ia. ic 

20. 24. 28. 2C 30. 34. 38. 3C 

Interrupts are always accepted by the processor to which they are directed. The interrupt originator 
receives no aclcnowledge. In effect, storing to an interrupt posting address simply requests an inter- 
rupt in the destination processor. There are 16 Interrupt c/asses. The lower nunnbered interrupt 
posting address corresponds to the lower numbered interrupt class. 



7.2 Interrupt Control Register 

Associated with each interrupting address in an AT processor are both an interrupt enapie and an 
interrupt pena flags. These 2 bits are avaiiabSe in the interrupt control register, ICmL. The register^ 
should be read and written only as a longword quantity. 



INTERRUPT CONTROL (ICTRL) 

31 30 



16 15 



OQpp 0208 

00 



IENAB(14:00] 



IPENDt15;001 



lENAB « INTERRUPT ENABLES POR INTERRUPT CLASSES 0 TO 14 RCAD. WRITE 1 TO XOR 

IPENO a INTERRUPT REQUESTS FOR INTERRUPT CLASSES 0 TO 15 READ ONLY 



N.B.. INTERRUPT CLASS 15 IS ALWAYS ENABLE 

PP 8 PROCESSOR SELECT NUMBER 
00. 04, 08. oc 10. 14. 18. 1C 
20. 24. 28, 2C 30. 34. 38. 3C 



The interrupt pend bit is set when a write to the associated interrupting address is detected. The 
pended interrupt will b responded to when its specific Interrupt enetil bit is set and there is no 
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comorenensive trao masking otnerwise m effect. The nighest pnonty enaoled interrupt pend bit ts 
cleared autonnaticalty upon the processor reaaing tne interrupt sumn^ary register. The corresponding 
interrupt enaoie bit will also be cie^rea stmuttaneousiy. 

The interruDt enable bits may be set and cleared directly by processor writes to the (CTRL register. 
Storing to tne iCTRL register loads the interruot enaoie portion of the register with the XOR of the 
current register contents and the store data. This permits the needed selective updates of registv 
contents. 

7.2.1 Non Maskable interrupt 
Interrupt level 15 cannot be masked. 

7.3 interrupt Summary Register 

The intBrrupt summafy register identifies the highest priority interrupt that is both pending and en- 
abled. If no interrupt is pending. ISUM<4:0> wiil be zero. The register should be read only as a 
longword quantity. 



INTERRUPT SUMAAARY REGISTER (ISURA) OOpp 0200 

31 OS 04 03 00 



ISUM 



ISUM m HIGHEST INTERRUPTINQ LEVEL READ ONLY 

I - 1 -> ENABLED iNTERRUPT PENOINQ 



N.B.. READINQ CLEARS tPEND(ISUM) AND IENAB(tSUM| 

PF« PROCESSOR SELECT hRJMSn 
00. 04. 08. OC 10. 14.' 18. 1C 
20, 24. 28. 2C 30. 34. 38. 3C 



7.4 Bus Controi Register 

The bus controi register permits operational code access to the DTS force hit and nnisa functions. In 
addition, the BCmL register captures overall state of any software recoverable error detected by the 
BF. The register should always be read and written only as a longword quantity. 

The HI and Ho bits force the duplicate instruction and data/operand tag stores to hit when a lookup for 
an X-Bus write is in progress. The Ml and Mo bits force that lookup to miss. The operation when both 
the force hit and force miss bits for the same duplicate tag store are set. is undefined. 

The 6n and El bits are the trap enables for Bus write no response and bus lock timeout respectively. 
When either trap is pending, whether enabled or not, the corresponding W or l bit wiW^also be setr The 
trap must.be. expUcitly^acknowtedged in software by writing O's to W and L. Setting W or L nonzero 
while the associated trap is enabled, will trigger an IP trap. Breaking a lock by trap dispateh wW not be 
recorded as a lock timeout. * 
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BUS CONTROL REGISTER (BCTRL) 

31 30 29 28 27 26 2S 



OOpp 0210 

06 05 02 01 00 



NORESP 



HI 38 1 -> 

Ho a 1 -> 
Ml * 1 -> 
Mo a 1 -> 

En = 1 -> 
a s 1 

W 3 1 -> 

L a 1 -> 

NORESP 



FORCE HIT. orrs 

FORCE HIT. DOTS 
FORCE MISS. DITS 
FORCE MISS. DOTS 

ENABLE BUS NO RESPONSE TRAP 
ENABLE LOCK TIMEOUT TRAP 
BUS WRITE NO RESPONSE TRAP PENDING 
LOCK TIMEOUT TRAP PENDING 



0000 NO ADDRESS CAPTURED 

1 REAO ADDRESS CAPTURED 
-1-0 WRITE ADDRESS CAPTURED 
*-lO FETCH ADDRESS CAPTURED 

1—1 REAO ADDRESS CAPTURED. SUBSEQUENT NO RESPONSE 

-1-1 WRITE ADDRESS CAPTURED. SUBSEQUENT NO RESPONSE 

—1 1 FETCH ADDRESS CAPTURED. SUBSEQUENT NO RESPONSE 



READ/WRITE 
RSADtWRITe 
REAOiWRITE 
REAOiWRITE 

REAOfWRITE 
READ/WRITE 
READIVHRfTE 
REAOtWRfTE 

REAOiWRITE 



PP» PROCESSOR SELECT NUMBER 
00. 04. 08. OC 10. 14. 18. 1C 
20. 24. 28. 2C 30. 34. 38. 3C 



ThB NORESP field indicates what address has been captured in the ERRADOR register. This field will 
usually be zero except after a no nsponsB aek on the X-Bus. \A/hen this field becomes non-zero, 
whether by software action or because of no bus response, the ERRADOR register ceases to dooK. If 
mutlipie failures to respond have occurred, the LSB of the field will be set. The remaining bits and the 
ERRADOR will reflect only the first failure. The tacic of bus acknowledge will result in either a write no 
response trap from the BIP. or a trap from the MMU. The NORESP field should be zeroed by the i 
handler after the ERRADOR has been recovered. 



7.S 



Bus Error Address 



The physical address of any read, write or fetch request that receives no bus acicnoweldge upon 
transfer is captured in the Pus error address register. ERRADOR. The register begins ctocidng again 
only after software haa cleared the NORESP field of the BCTRL register. This field also associates the 
ERRADOR register contents with the transfer type. The ERRADOR register format follows. 
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BUS ERROR ADDRESS REGISTER (ERRADDR) QOpp 0218 

31 30 29 02 01 00 



AODRESS(2g:02) 



READ ONLY 

PP« PROCESSOR SELECT NUMBER 
00. 04. 08. DC 10. 14. 18. 1C 
20. 24. 28. 2C 30. 34. 38. 3C 

The captured error address may not correspond directly to the program requested address because 
of cactie fill address zeroing, or write merging. 

7.6 BIF Buried/Scan State 

Buried state, state readable and writable under scan control only, is provided in the BIF. Some of the 
state is needed for functional operation, e.g. the board Id. Some of the state is used to selectively 
disable various accalBrators in the BIF. This latter state is used for diagnostic assistance. 



7.6.1 Board ID 

There is a four bit board identifier field. BDJO(3:0). in the scan ring. The field is used for slave 
address decoding and read address source 10. The lower two bits also decide which class B arbitra- 
tion level the is tC is operating on. 

This field is only In the CBA gate array. 



7.6.2 Arbitration Level 

There is a two bit arbitration level field. ARB_LEVEL(1:0). in the scan ring. The field should be set to 
the same vaiue as SO JD(1 :Q) . It is used to decide which class 8 arbitration level the is IC is operating 
on in the C80 tC's. 

This field is in the CBD gate arrays. 



7.6.3 Write Multiple Inhibit 

There is a one bit WRITE_MULTIPLEJNHIBIT bit in the scan ring. When set. the BIF wilt not generate 
write multiples other than quadwrltes. 

This field is only in the CBA gate array. 



7.6.4 Write Merge inhibit 

There is a one bit WRrTE.MERGEJNHlBrr bit in the scan ring. When set, the BIF will not generate write 
multiples other than quadwrites. 

This field is only in the CBA gate array. 
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7.6.5 Read Before Write intiibit 

There ts a one bit READ^BEFORE^WRITEJNHIBIT bit in the scan ring. When set. the BIF wiil not permit 
data cacne reads to precede data cacne writes. 

This field is only in the C8A gate array. 

7.5.6 Write Hoidoff Inhibit 

There is a one bit WRrrE^HOLDOFFJNHISrr bit In the scan nng. When set. the BIF wiil issue queued 
writes as soon as possible. 

This fieid is only in the CBA gate array. 

7.6.7 Instruction Caciie Parity Inhibit 

There is a one bit NOJCACHE^PARITY bit in the scan ring. When set. the BIF wiil never check instruc- 
tion cacne data parity. 

This field Is only in the C80 gate arrays. 

7.6.8 Data Cache Parity Inhibit 

There is a one bit NO_0CACH£_PARITY bit In the scan ring. When set. the BIF will never checic data 
cache data parity. 

This field is only in the C80 gate arrays. 

7.6.9 DTS Parity Inhibit 

There is a one bit NOJDfTS.PARnv bit in the scan ring. When set. the SIF wiU never checic parity in Uie 
OrrS or DOTS. 

This field is only in the C8A gate array. 

7.6.10 Force Parity Sense 

There are two F0RCE_PARnY(1 :0) bits in the scan ring. When zero^ the BIF will generate normal 
parity. When nonzero, the BiF will force ali output parity to 1 's or O's in the OITS. DOTS, instruction 
and data caches. FORCE.PARITY » io generates O's. FORCE^PARITY s 11 generates rs. 

This field is present in both the CBA and CBD gate arrays. The CBA field controls simultaneously both 
the DTTS and DOTS parity. The CBD field controls both the instruction cache data and data cache data 
parity. 

7.6.11 DTS Parity Error 

There Is a one bit DTS_PARITY_ERR bit in the scan ring, it's set when a OTS parity error is detected 
and remains set until cleared under scan control. When set. the BiF will request the cioclcs to stop. 
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This bit is only in ttie C8A gate array. 

7.6.12 Instruction Cache Parity Error 

There is a one bit INST_PARITY_ERR bit in the scan nng. It's set when an instruction cacno data panty 
error is detected and remains set until cleared under scan control. When set. the BIF witi reauest the 
ctocks to stop. 

This bit is only in the CBO gate array. 

7.6.13 Data Cache Parity Error 

There is a one bit OATA^PARITY.ERR bit in the scan ring. It's set when a data cache data parity error 
is detected and remains set until cleared under scan control. When set. the BIF will request the clocKs 
to stop. 

This bit is only in the CBO gate array. 

7.6.14 X-BUS Overlap Control 

There is a one bit ONE_ATATIME bit in the scan nng. When set. the BIF will not issue a second X-Bus 
reference before the iast is fully complete. For a write, that means a successful ACK. For a read, that 
means a successful read data return. 

This field is only in the GBA gate array. 

7.6.15 Retry Backoff Inhibit 

There is a one bit NO.BACKOFF bit In the scan ring. When set. the BIF will reissue retry requests as 
soon as possflDla* 

This field is only in the CBA gate array. 

7.6.16 Read Response Error 

There Is a f^EAD^RESPONSEJKVlOR bit in the scan ring. It's set when the BIF accepts a READ RE- 
SPONSE which triggers an error acknowledge. Typically, this would be a parity error. The bit remains 
set until cleared under scan control. When set. the BIF wHi request the clocKs to stop. 

This field is only in the CBO gate arrays. 

7.6.17 Arbitration Timeout 

There is an ARBjnMEOUT bit in the scan ring. It's set when the BIF's arbitration timer elapses before 
acquiring the X-Bus. The bit remains set until cleared under scan control. When set. the BIF wttt 
request the clocks to stop. 

This field Is only in the CBA gate array. 
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7.6.18 Read Return Timeout 

There is a REAO_RETURN_T!MEOUT bit in the scan ring. It's set when the filF's read return timer 
etaoses before an expected READ RESPONSE arrives. The bit remains set until cleared under scan 
control. When set. the 61F wiil request the clocKs to stoo. 

This field is only in the C8A gate array. 

7.6.19 Error Acknowledge 

There is an ERROR_ACKNOWLEDGE In the scan ring. It's set when the BIF receives an error ack- 
nowledgement to an address transfer. It's also set when a no acknowledge response to a data trans- 
fer cycle of a write multiple occurs. The bit remains set until cleared under scan control. This bit does 
nor request clock stopping. 

This field is only in the CBA gate array. 

7.6.20 DTS RAM Diagnostic Address Generation 

There is a one bit OTS_OIAGAOOR bit in the scan ring. When set, the BIF CBA will generate increasing 
DTSINDEX add/esses. These addresses are used for the selftest of the OTS and the primary cache 
RAM's. See chapter 9. 

This bit is onty in the CBA gate array. 

7.6.21 OTS Diagnostic Data Generation Control 

There is a one bit OTS^OATAi^ bit in the scan ring. It is used to control the source of data for writing 
and comparison during the OTS selftest. See chapter 9. 

This bit is only in the CBA gate array. 

7.6.22 DTS Diagnostic Data Writing Control 

Ther« is a one bit DTSJDIAQWE bit in the scan ring. When set. diagnostic data wiH be written into the 
OTS RAM's every cycle. See chapter 9. 

This bit is only In the CBA gate array. 

7.6.23 DTS Diagnostic Error 

There is a one bit DTSJTESTERR bit in the scan ring. It is set if there Is a miscompare during the 
selftest of the OTS RAM's. See chapter 9. 

This bit is only in the CBA gate array. 

7.6.24 Cache Diagnostic Data Generation Control 

There is a one bit GACHE_DATALO bit in the scan ring, it is used to control the source of data for 
writing and comparison during the cache data selftest. See chapter 9. 
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This bit is in the CBO gate arrays. 



7.6.25 Cache Diagnostic Data Writing Control 

There is a one bit CACHE^DIAGWE bit in the scan ring. When set. diagnostic data will be wntten into 
the cache data RAM's every cycie. See cnapter 9. 

This bit is in the CBD gate arrays. 



7.6.26 Cache Diagnostic Error 

There is a one bit CACHE JTESTERH bit In the scan ring. It is set if there is a miscompare during the 
selftest of the cache data and parity RAM's. See chapter 9. 

This bit is in the C8D gate arrays. 



7.7 IP- Trapping 

A three bit trap code is sent from the BIF to the IP. There are only five useful codes, BIF ERROR Is 
either a wnte bus no response acknowledge or lock timeout. The BCTRL register must be read to 
determine which. 



BUSjrRAPjeQC2:0) 

000 NO REQUEST 

001 BIF ERROR 

010 INTERRUPT 

011 BIF ERROR/INTERRUPT 
1— NMI 



Whenever the IP initiates a trap sequence, the Signal !P_TRAP_DISP wiH be asserted. The assenton of 
this signal will unconditionaily release the bus lock. " 
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8.1 Instruction Cache Data Parity 

The BIF CBD IC's maintain and check parity on the 64 bits of the instnjction cache data RAM's. There 
is one parity bit over each 32 bits. iNST_PARmr(0) holds panty over all even bytes of the INST bus. 
iNST^PARITYCi) holds panty over all odd bytes of the INST bus. This odd even division permits one bit 
to be maintained per CBO gate array. 

Odd parity Is maintained, that is the sum of all ones in the 32 bits of data plus the parity bit should be 
odd. 

lNST_PARrTY(1 :0) are tsidirectionai. There is one 16KX4 RAM devoted to hoiding the parity. The parity 
RAM is aiways accessed in the cycle after the instruction cache's data RAM's. The address ts piped 
forward unconditionally in external registers. The instructian panty is aiways good. 

8.1.1 Instruction Parity Checking 

The parity is aiways checiced on the INST bus unless the CBD gate array is driving it. The CBO gate 
arrays drive it only during instruction cache nruss. 

The parity is checKed in the cycle of the instruction parity RAM access. If a parity error is detected, a 
hardware fault is assumed. The CBO gate array requests the SCR to liait the system clocks and 
freezes error status in the embedded scan state. 

8.1.2 Instruction Parity Generation 

When instruction cache fill Is underway, Instruction parity is computed from the X*Bus parity. The 8 
X-8us parity bits are reduced to 2. These 2 parity bits are loaded into an outbound instruction parity 
register for sourcing onto INST_PARrrY(l :0} the cycle after the instruction data. If the instruction 
cache's data f^M's are being written, the parity RAM will be wrftten unconditionally In the cycle to 
follow. • 

Embedded state may force the 1NST_PARITY(1 :0) bits to always be 1 , or always be 0. 

Diagnostic RAM update, see chapter 9, mimics an extended Instruction cache fill. Parity will typically 
be part of the diagnostic pattem generation. 

8.2 Data Cache Data Parity 

The BIF CBO lC*s maintain and check parity on the 64 bits of the data cache data RAM's. There is one 
parity bit over each 8 bits. This is forced by the need to update bytes individually. DATAJPARITY(O) 
provides parity over DATA (63:58). DATAJPARITY(7) holds parity over OATA(07:00) . Each CBD gate 
array is responsible for 4 parity bits. 

Odd parity is maintained, that ts the sum of ail ones in the 8 bits of data plus the parity bit should be 
odd. 
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Cacti Parity 



There are 8 16KX1 RAM's devotea to holding me oarity. The RAM's have separate data in and out 
pins. CorresDondingty. there are separate OATAJPARrrY(7:0) and DATA_OPARrTY(7:0) signets. The 
panty RAM's are always accessed in the cycle after the data cacne's data RAM's. The address is 
piped forward unconditionally in external registers. The data parity is always good. 

8.2.1 Parity Checking 

The panty is checked on the DATA bus when the signal CHECKED ATA is asserted. This signal is 
extemaiiy derived from the RAM controls of the data cache. This signal should be asserted to the CBD 
IC*sln the cycle after reading the data RAM's whenever the RAM's are read. That should be nnost of 
the time except during processor stores and data cache filling. 

The parity is checked in the cycle of the data parity RAM's access using OATAJPARITY(7:0). If a 
parity error Is detected, a hardware fault is assumed. The CBA gate array requests the SCR to halt the 
system clocks and freezes error status in the embedded scan state. 

8.2.2 Partty Generation 

Parity is always provided by the CBO. When a data cache fill is unaQrway, data panty is passed directly 
from the X-Bus parity. These 8 parity bits are loaded into an outbound instruction parity register for 
sourcing onto DATA jOPARITY (7:0) the cycle after the data. Parity is also always being computed on 
the DATA bus directly. When a cache data fill is nor underway this parity is sourced onto the 
OATA_OPARnrY(7:0) Instead. If the data cache's data RAM's are being written, the parity RAM's wtti 
be written unconditionally in the cycle to follow. 

Embedded state may force the OATAJDPARrrY(7:0) bits to always be 1 . or always be 0. 

Diagnostic RAM update, see chapter 9. mimics an extended data cache fill. Parity will typically be part 
of the diagnostic pattern generation. 

8.2.3 Secondary TB Data Parity 

The CBO IC's are unaware of whether a secondary TB took up, or a data cache read is underway in the 
data cache. 



The CBA IC maintains and checks parity on the 18 bits of the OUS* RAM's. There is one parity bit over 
aU 18 bits, DnrSJ'ARITY, 

Odd parity is maintained, that is the sum of alt ones in the 18 bits of data plus the parity bit should be. 
odd. 

orrS^PARITY IS bidirectional and is accessed in the same cycle as the tag contents. The OITS parity is 
always good. 

8.3.1 Parity Checking 

The parity is aiways checked on the OITSJ3ATA(29:12) unless the CBA gate array is sourcing it. Tha 
CBA gate arrays does so only in association with the READ RESPONSE phases of an instruction cache 
fill's READ MULTIPLB, or during a OtTS entry invalidation cancellation. 



8.3 



Instruction Cache Duplicate Tag Store Parity 
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Hie parity is checked in the cycle after the HAM access. This may change if timing permits. If a 
parity error is detected, a hardware fault Is assumed. The CBO gate array requests the SCR to halt the 
system clocks and freezes error status in the emoedded scan state. 

8.3.2 Parity Generation 

Two cycles after the READ RESPONSE to an instruction cache miss's READ MULTIPLE, the OITS fs 
being updated. The OITS Is also upoated during RAM diagnostic operation and during entry Invalida* 
tion. In all cases, parity is generated the cycle before the RAM wnte. 

Embedded state may force the OITS_PAfllTY to always be 1 . or always be 0. 

8.4 Data Cactie Duplicate Tag Store Parity 

The CBA 10 maintains and checks parity on the 18 bits of the DOTS' RAM's. There is one parity bit 
over all 18 bits. D0TS_PAfl1TY. 

Odd parity is maintained, that is the sum of all ones in the 18 bits of data plus the parity bit should be 
cod. 

DOTS.PARITY is bidirectional and is accessed in the same cycle as the tag contents. The DOTS parity 
is always good. 



8.4:1 Parity Checldng 

The parity is a^ays checked on the D0TS_DATA(29:1 2) unless the CBA gate array is sourcing it. The 
DBA gate arrays does so only in assodatkm with the READ RESPONSE phases of a data cache fiii's 
READ MULTIPLE, during DOTS entry Invalkiation cancellation, or after a cacheable local stora. 

The parity is checked in the cycle after the RAM access. This may change if timing permits. If a 
parity error is detected, a hardware fault Is assumed. The OBA gate array reqtiests the SCR to halt the^- 
system clocks and freezes error status in the embedded scan state. 

8.4.2 Parity Generation 

Two cycles after the READ RESPONSE to an cacheable data cache miss's READ MULTIPLE, the DOTS 
Is being updated. The DOTS fa also updated during RAM diagnostic operation and during entry invali-' 
dation. Rnally. the DOTS Is updated two cycles after a locally generated cacheable write is transfered 
on the bus. In all cases, parity is generated the cycle before the RAM write. 

Embedded state may force the DOTS.PARITY to always be 1 , or always be 0. 
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CHAPTERS 



RAM SELFTEST 



9.1 



DTS RAM Diagnosis 



The BtF CBA provides assistance for the accelerated and sinnultaneous testing of both the OITS and 
DOTS external RAM's and address logic. Four funcnons are provided that may operate at functional 
clock speed. 

a 16K entries nnay be set to a fixed value. 

• 16K entries may be read and compared against a fixed value. 

• 16K entries may be set to an incrementing value. A modulo 15 counter is used. Data is 
replicated every 4 bits. 

• 1 6K entries may be read and compared against an incrementing value. Data is replicated 
every 4 bits. 

The parity bits associated with these RAM's may also be controlled. They may be fointly forced to 1 . 
jointly forced to 0, or allowed to operate functionally. During the read and compare mode, the 
similarly controUed parity Is checked for. 

Only the CBA and the external MSI should be clocked while in this test mode. The number of locations 
to fill, or check, is decided by the dursf count field in the SCR. 

9.1.1 DTS Address Generation 

Setting the OTS^OIAQAOOR bit in the CBA scan path will cause an alternative OTS address source to be 
used. The address will begin at the value scanned into the OTSINOEX registec. The address wiU 
increment through 14 bits with every functional clock. 

The address is sent to both OUS and DOTS concurrently. 

9.1.2 OTS DatrGeneration 

Clearing the 0TSJ3ATAL0 bit in the CBA scan path will cause whatever value is loaded into the 
OTSOATA register to be held for the duratton of the RAM test. The generated parity is whatever was 
scanned into the OTSDATA parity flops. 

Setting the DTS.OATALO bit will cause the OTSOATA register contents to increment every cycle. On 
four bit boundaries, the data wiil Increment 0.1. ... 14, then recycle. A count modulus that was 
relatively prime to the RAM address was chosen. The generated parity will either be correct, or all 
ones or all zeroes, depending on the state of force parity sense scan bits in the CBA. FORCE.PAR* 
ITY(1:0}. Code 00 is normal, code 10 is force all zeroes and code 11 Is force all ones. 

9.1.3 DTS Data Writing 

Setting the OTS.DIAQWE bits in the CBA scan path will cause the OTS data source be written every 
cycle. 
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9.1.4 DTS Data Comparison 

Claanng the OTS JDIAQWE bit wiU cause the RAM data to be compared to the OTSOATA register every 
cycle. The panty btts will be checked for as well as the data. 

If a compare error is found, the scan bit OTS.TESTERH ts set. Once set, the bit remains so until 
cleared by scan. 

In functional operation, this bit will probably frequently be inadvenantiy set. 
9.2 Cache Data RAM Diagnosis 

The BfF C8A provides assistance for the accelerated and simultaneous testing of both the INST and 
DATA RAM's and address logic. Four functions are provided that may operate at functional clock 
speed. 

• 16K entries may be set to a fbced value. 

• 16K entries may be read and compared against a fixed value. 

• 16K entries may be set to an incrementing value. A modulo IS counter is used. Oata is 
replicated every 4 bits. 

• 16K entries may be read and compared against an incrementing vaiue. Oata is replicated 
every 4 bits. 

The parity bits associated with these RAM's may also be controlled. They may be jointly forced to 1 , 
jointly forced to 0. or alkiwed to operate functionally. During the read and compare mode, the 
similarly controtted parity is checked for. 

Because the data cache is only one half the size of the instruction cache. The data cache testing with 
incrementing data values will have to be stopped after 8192 entries. 

Only the CBA. CBD and the external MSI must be clocked while in this test mode. The MMU is likely to 
be docked as weU. MMU buried scan state must be set to drive the PA bus onto both the EASRC and 
PCSRC buses and to defeat the MMU's driving of the PA bus. MMU buried state must also force the 
selection of the data half of the data cache's data store. The secondary TB half of the data cache's 
data store will be diagnosed by the MMU. MMU buried state must force the write enable generation in 
the data cache RAM's when required. MMU buried state must prevent the MMU from inadvertantly 
sourcing the DATA or INST buses. The other Cs which touch these buses are assumed not to dock 
and to be loaded with a state vector that witt keep them from interfering with the RAM diagnostic test. 
The number of RAM locations to fill, or check, is decteted by the tsurst count Held in the SCR. 

9.2.1 Cache RAM Address Generation 

The cache RAM address will be derived from the DTSINDEX. Setting the DTS_DIAGADDR bit in the C8A 
scan path will cause an alternative OTS address source to be used. The address will begin at the value 
scanned into the DTSINDEX register. The address will increment through 14 bits with every functional 
dock. 

The address is sent to through the invalidate address pipeline to both instruction and data caches. 
This pipeline, on top of the external EA and PC registers, makes it a litu harder to configure the 
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address generation. The first desirea address snoutd be scanneo into INVXFER, the second to IN- 
VARS, the third to DTSINOEX ana the fourth ana thereafter will be aigonthmically generated. 

Setting the DTS_DIAQADDR bit in the CBA scan oath wiil also cause the PA bus to be sourced by the 
C8A and the invalidate address queue path to oe chosen as CBA*s internal the PA source. 

9.2.2 Cache RAM Data Generation 

Clearing the CACHE^DATALD bit in the C8D scan path will cause whatever value is loaded into the 
XDATAIN register tol>e held for the duration of the RAM test. The generated parity is whatever was 
scanned into the XDATAIN 8 parity flops. 

Setting the CACHE_OATALO bit will cause the XDATAIN register contents to increment every cycle. On 
four bit boundaries, the data will increment 0.1. ... 14, then recycle. A count modulus that was 
relatively prime to the RAM address was chosen. The generated parity will either be correct, or all 
ones or aU zeroes, depending on the state of force parity sense scan bits in the C80. FORCE^PAR- 
rrY(1:0). Code 00 Is normal, code 10 is force ail zeroes and code 11 Is force all ones. 

9.2.3 Cache Data Writing 

Setting the CACHE_D1AGWE bits in the CBD scan path will cause the cache data source to be driven 
every cycle, and the cache data parity source to be driven every nexf cycle. It's expected that 
corresponding state in the MMU will generate the write strobes. 

The writing of the RAM parity one cycle after the RAM data will make the proper testing of the last RAM 
parity location troubteaome. 

9.2.4 DTS Data Comparison 

Clearing the CACHE JDIAQWE bit will cause the RAM data to be compared to the XDATAIN register 
every cycle. The parity bits will be checked for as well aa the data. 



If a compare error is found, the scan bit CACHEJTESTERR is set* Once set. the bit remains so untH 
cleared by scan. . ^ ^ ^ ^ ^ . 

In functional operation, this bit wilt probably frequently be inadvertantly set. 



9.3 Cache Tag RAM Diagnosis 

The BIF CBA cache RAM diagnostic address generation can be used for the cache tag RAM diagnosis. 
The MMU Is responsible for data sourcing and comparison. 

9.4 Secondary TB Data RAM Diagnosis 

The BIF CBA cache RAM diagnostic address generation can be used for the secondary TB data RAM 
diagnosis. The MMU is responsible for data sourcing and comparison. 
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A.1 X-Bus interface 

This section describes the set of signals of the bus interface chips which actually are used to commu- 
nicate on the X<-6us. A more detailed description of the X-6us and its operation can be found In the 
X-0US AT Specification (KUNE 86). The CPX_ prefix for the X-Bus signals denotes the CPU's 
transceived X-*Bus, rather than the backplane. 

CPX_DATA{63:01 CBA: 32 lo CBD: 32 lo 

The CPX^OATA bus is the transceived version of the X-Bu's multiplexed address and data signals^ 
The CBA will drive and receive the most significant 32 bits of this bus for address information and CSR~ 
access while the CBO*s will drive and receive the entire bus for data information. One data chip vmtl 
access all the even bytes of the bus and the other all the odd bytes. 

The CPXJDATA[63.62.1 .0] signals also hold the 4 valid byte indications needed on the 32 bit read and 
write commands. When CPX_DATA(631 is asserted. CPX^OATA (31:24] is to be read or written. 

GPX_PARrrY[7:0] CBA: 4 ts^o CBD: 4 io 

The CPX^PARITY bus will reflect the byte parity of the CPX_OATA bus where CPX^PARITYIO] is an odd 
parity bit for CPX_DATA [63:561. Parity will be maimained such that the sum of aiTthe bits that are set 
in a byte plus the parity bit for that byte will equal an odd number. ( An alt zero byte will have a parity 
bit of 1 . } Parity will be driven when the CPX^DATA bus is driven and checked by this interface when 
addressed. * 

CPX^VPNIN14:01 CBA: 5 lo 

The CPXJVPNIN bus receives the 5 least significant X*-aua VPN bits needed by the CBA for proper 
indexing of the OTS and primary caches. 

When the CBA observes a write operation occuring on the bus, including one that It generated, it will 
use a concatenation of CPX_VPNIN{4:01 and CPX_OATAt43:35] as an Index into the Duplicate Instruc- 
tion Tag Store (DITS) and CPX_VPN1N13:01 and CWC_DATA [43:35] as an Index into the DupKcate Oper* 
and Tag Store <DQTS}. 

CPX_VPNOUT16:0] CBD: 4 out 

The CPXJ/PNOUT bus is driven by the two CBD gate arrays. One drives 3, the other 4 signals. The 
CPX^VPNOUT bus is soureed during a BIF address transfer. 

CPXJD[3:01 CBA:4lo 

Th CPXJD bus is driven with the Board 10 when the CBA is using th X-6us. CPXJD is monitored to 
detect a match with Board ID when a READ RESPONSE is decoded on the bus. A match signifies that 
this CBA is the destination of the READ RESPONSE transfer. 
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CPX_SUBID[01 



CBA: 1 io 



The CPX_SU8ID pin is used by the CBA to distinguish between a data cache and an instnjction cache 
READ RESPONSE when both requests are outstanding. The CBA also receives and returns this signal 
during BIF CSR access. 



CPX^CMD[4:01 



CBA: 5 io 



The CPX^CMD bus is driven and monitored to signal the type of bus cycie being performed. See the 
X-Sus AT Specificatipn for a detailed description of the bus operations. 

The CBA will only generate and respond to a subset of the commands. 



CBA.ACK[1:01 



CBA: 2 io 



CBD: 2 in 



The CBA_ACK bus is driven and monitored by the CBA to signal and receive the status of a bus 
transaction. The CBA drives CBA^ACK only for a successful acknowledge, or non-parity related 
transfer failure. The C80 nrvanitors this signal only to make write queue- unloading and art)itratlon- 
result decisions. The encodings are listed In the following table: 



Code 



Response 



Oescnption 



n 
10 
01 
00 



ERROR 
BUSY 

CMO ACCEPTED 
NO RESPONSE 



Parity error or command reject on previous transmission 
Destination device is not available to accept a command nov^ 
Positive acknowledgement 
No device has responded 



Activity on the CBA^ACK bus alway refers to the bus cycle that occured two cycles earlier. 
CBD_ACK[1:01 CBD: 2 out 

The CBO.ACK bus is driven t»y the CBD's when a parity error on received bus data is detected, and 
this bus interface was the destination. The are two bus's on the board. CB0Q_ACK(1:0) and 
CBD1_ACK(1:0) togicatty or'd by the backplane drivers. 



CPX ARB INHIBIT 



CBA: 1 io 



CBD: 1 In 



The CPX.ARBJNHIBirsignal wW be asserted by the CBA during all but the last cyde of a multl-oyde 
transaction for which it has ownership of the bus for. The CBA will never attempt to use the X«6us the 
cycie immediately following a cycle in which tiie ARB signal has been deasserted. The CBD monitore 
this signal to deduce the arbitration result. 



CPX LOCKIN 



CBA: 1 in 



The CPX.LOCKIN signal Is received by the CBA. If the CBA has a READ operation pending which wants 
the bus lock, it will not attempt to arbitrate for the bus until tine CPX.LOCKIN signal is unasserted or If it 
is the one that is asserting CPX_LOCI^UT. 



CPX LOCKOUT 



CBA: 1 out 



The CPX^LOCKOUT signal is asserted by the CBA when it has conducted a READ bus cycle that 
needed th bus lock. The CBA will keep CPX^LOCKOUT asserted until it completes either a READ or 
WRITE operation whl^ releases the bus lock, or a lock timeotit occurs. 
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CPX^AREQ CBA: 1 In CBD: 1 In 

The CPX_AREO signal is a transceived version of AREQ^SUM on the backplane. This signal is assaned 
if any of the class A devices, all of which are at a higher pnority than a CPU. want the bus. 

CPX_BREa(3:Q) CBA: 4 In CBD: 4 in 

The CPX^BREQ signals are the transceived version of the class B request tines on the backplane. The 
CBA will be assigned one of these request levels. The arbitration algorithm is described in section 2.1 . 
Both CSA and CBO morutor these signals to concurrently decide arbitratton outconne. 

CPX^WIYREQ CBA: 1 out 

The CPX^MYREQ signal is driven by the CBA when the X-Bus is required and not already held, 
CPX^REJECTIN CBA: 1 In 

The CPX^REJECTIN signal will monitored by the CBA to detenmtne if a CSR write or read should be 
cancelled. 

CPX.REJECTOUT CBA: 1 out 

The CPX_REJECTOUT signal will be driven by the CSA the cycle immediately following one in which the 
CBA was'the bus master, when the effects of that last transaction are to be aborted. 

□RIVEXA- 
DRtVEXB- 
□RIVEXC- 
DRIVEXD- 

MYACK- CBA: S out 

These 5 signals enable the X-8us transceivers. The signals are sourced by the CBA. 
A.2 A/IMU Interface 

PA[29:01 CBA: 30 lo CBO: 3 In 

The PA bus provide the physical address for aU memory references made by the MMU. Physical 
addresses are presemed to the CBA for processor writes, table walking, read miss requests* all read/ 
write/fetch requests when the MMU Is cfisabted* fetch miss requests, and for broadcasting TLB invalid 
dates. This bus is tristated when the CBA secures It for a data cache, Inatniction cache, or TLB 
invalidate, in this case, the CBA drives the PA bus with the invalidate address which Is then routed to 
the EASRC or the PCSRC bus via the invalidate transceivers depending upon the type of invalidation 
being requested. The CBO receives the 3 lower order pins to permit it to determine which bytes are 
valid on a write. 

PCVPN[6:01 CBD: 4 in 

The 7 PCVPN signals are received by the CBO's and forwarded t the CPXJVPNOUT for processed' 
reads and writes. One CBD handles 4 signals, the other 3. 



Nov mber 15, 1986 



EP 0 366 434 A2 



Appendix A - Pins 



EAVPN[6:01 CBD: 4 in 

The 7 EAVPN signals are received by the CBD's and forwarded to the CPX^VPNOUT for processor 
reads and wntea. One CBD handles 4 signals, the other 3. 

MMU_CMD(4:01 CBA: 5 in CBD: S In 

A memory request is initiated when the MMU by assening the MMU_CMD(4:01 signals. The interpreta- 
tion table is in chapter 3 and 4. 

The CBAyCSO will always accept commands from the MMU_CMO bus unless WBUF^RJLL is asserted. 
The MMU^CMO bus. together with PA(2:0). determine which bytes are affected if the operand size is 
lass than or equal to eight bytes. 



BIF.PAARBncO] CBA: 2 out 

For the interactions between the BIF and the MMU that require the use of the EASRC. PCSRC. or PA 
busses, the BIF will assen BlF_ARB(l :0] to aquire control of the necessary buses. The encoding is 
available in chapters 3 and 4. 

BIFJNVOPt1:01 CBA: 2 out 

When the BIF detects a write on the X-Bus that hits on the local processor's cache it issues a eaefte 
invalidate request by first arbitrating for the appropriate buses and In the following cycle, asserting 
BIFJNVOP signals, which cause the MMU to dear the valid bits in the identified caches or TB. The 
encoding of BIFJNVOP[1 iQ] is available in chapters 3 and 4. 

If the BIF detects that a write over the X-8us that collides with an outstanding cache fill request, which 
is not in the OTS. it asserts the code for Cache Invalidate on the BIFJNVOP lines so that the subsa^ 
quently returned data is allocated as invalid In the appropriate cacheT 

MEM3ESP[2:0I CBA: 3 out 

The CBA asserts MEM.RESP in response to a load or fetch MEM^CMD. MEMJHESP indicates the 
disposition of the returning read data in the cache. The encodingTa available- inohajpters 3 and 4. 

HOLDJVPN CBD: 1 In 

HOLOJVPN is asserted whenever the IVPN Is not immediately succeeded, in the next cycte by the PA 
and the MEMjCMO. which Is the case for icache misses that must wait for use of the PA/MEM jCMD 
busses, for example. 

HOLD^DVPN CBD: 1 In 

HOLO.OVPN Is asserted whenever the OVPN is not Immediately succeeded, in the next cycle by the PA 
and the MEMjCMO. which is the case for write buffer full stalls, for example. 

MMU^HDATA.LD CBA: 1 In CBD: 1 in 

The MMU assists the CBO in the load control of Its input holding register to its write buffer. This signal 
is asserted when the MMU wishes to allow a previously transmitted data to be loaded into the write 
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buffer, ana to free up the input hotdtng register for another transaction. The C8A monitors this signal 
as well* 

WBUF^FULL CBA: 1 out 

WBUF^RJLL is asserted by CSA when it can only take one more write request before its write buffer 
fiils- 

A.3 iP Interface 

BUS jrRAP_REQ(2:0) CBA: 3 out 

BUSjmAP^PEQ Is asserted to the IP when the CBA requests an extemal interrupt. non-maakable>v 
extremal interrupt or BIF related error such as lock timeout, or write no response. 

TRAP.DISP CBA: 1 In 

TRAPJDtSP Signals that the IP Is entering a trap sequence. The signal always releases the bus lock, if 
held," 

A.4 Dupticate Tag Store Interface 
DTSJNDEX_SRC(16:03] CBA: 14 out 

The OTSJNOEX_SRC bus is used to load the extemal address register that's used to Jointly address 
the dupticate instruction and data cache tag store rams. 

DrrS.DATA[29:12I CBA: 18 lo 

The orrSJDATA bus is used to read and write the instruction cache duplicate tag store contents. 

DITS^PARITV CBA: 1 lo 

DITS.PARITY will contain parity Information for the Ons. 

0OTS.DATAt29:12] CBA: 18 lo 

The OOTS.DATA bus is used to read and write the data cache duplicate tag store contents. 

DOTS^PARITY CBA: 1 lo 

DOTS.PARITY will contain parity information for the DOTS. 

DTS.CMND[1:a] CBA: 2 out 

OTS.CMND Indicates what functions, read or writ , are to b performed On the duplicate tag stores* 
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A.S Cache interface 

DATA[63:Q01 CBD: 32 lo' 

The 64 bit data cache data bus is direcxiy received and driven by the CBO's. Each tC handles 4 bytes, 
split even and odd. 

DATAJPARrrYI7:01 CBD: 4 In 

There are 8 data cache parity signals. There are separate input and output signals, reflecting the 
external IBKxl RAM organization. OATAJPARiTY are the 8 RAM outputs. The C80 will check data 
cache data parity based on these. Again, there is an odd/even byte split. 

DATA_OPARrrY[7:01 CBD: 4 out 

There are 6 data cache parity signals. There are separate input and output signals, reflecting the 
external 16Kx1 RAM organization. DATA_OPARITY are the 8 RAM Inputs. The C80 will generate and 
source data cache data parity onto then. Again, there is an odd/even byte split. 

CHECK^DATA CBD: 1 In 

The CHECK.OATA signal is externaUy derived and instructs the CBD IC's when to check data cache 
parity. 

INST[63:00] CBD: 32 lo 

The 64 bit instruction cache data bus is directly received and driven by the CBO's. Each IC handles 4 
bytes, split even and odd. 

INSTJPARrrY[1:0] CBD: 1 lo 

There are 2 Instruction cache data parity bits. One covers all odd bytes, and one all even. The signals 
are bidirectional. Instruction cache data parity is checked and generated by the CBO's. 

A.6 CBA-CBD Control 

WBQ.CTL(2:0)1 CBA: 3 out CBD: 3 In 

A three bit code from the CBA Instructs the CBO to transmit, load or load and merge write data. 
NEXTREQC1:0] CBA: 2 out CBD: 2 In 

A two bit code from the CBA to the CBD informs the latter of the results of the internal arbitration; I.e.. 
what goes next on the bus. 

RLL.CTL[1:0] CBA: 2 out CBD: 2 In 

A two bit code from th CBA to the CBD controls whether to driv the data or instruction cache data 
bus. 
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OUPMS CBA: 1 out CBD: 1 In 

The OUPMS signal from the CBA to trie CBD instructs the latter to duplicate the more stgnifcant 32 bits 
of data on the OATA [31:00] signals for the benefit of the MMU. 

DELAYDATA CBA: 1 out CBD: 1 in 

The OELAYDATA signal from the CBA to the CBD causes the CBD to inject a one cycle delay in the 
return of read data. 

DESTJS^ME CBA: 1 out CBD: 1 In 

The DESTJS^ME signal from the CBA to the CBD causes the CBD to drive its CBO_ACK bus in the next 
cycle, to generate an error acknowledge, if there is a parity mismatch on currently held X-Bus data. 

A.7 Miscellany 

CLOCK^STOP- CBA: 1 od CBD: 1 od 

Each tC in the system may cause the clock to freeze in the event of hardware error. CLOCKjSTOP- is 
an open drain signal asserted tow to request a dock freeze of the SCR. 

SCAN_CTRL[6:0] CBA: 7 CBD: 7 

There are 7 scan path control signals: A, B. C, D. and E plus scan data in and scan data out. 

CLOCK CBA: 4 in CBD: 4 In 

There are 4 dock trees on each IC. Each tree requires a separate Input pin. 

VDD CBA: 18 CBD: 11 

GND CBA: 36 CBD: 22 

A.8 Pin Count Summary 

ALLOCATED PINS: C8A: 247 CBD: 198 

SPARE PINS: CBA: 9 CBD: 10 
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Chapter 2 

X-^Bus 



2*1 Ovemew 

The X*Bus is ihe medium used for comxnunicauons between the eight (future systems 
based on this design may use up to twelve) interconnecting processors. memor%' systems, 
and interfaces within the Series 10000 system. It is implemented usmg open-coliector driv- 
ers where signals are active low on the bus. Each device on the X-Bus uses it for all com- 
munieations (data transfers, requests for data transfers, interrupts, etc.) with other system 
devices. The X-Bus supports tightly coupled processors, but there is no requirement that 
the processors in the system be ughtly coizpled. 

The X-Bus is a synchronous bus that achieves its performance by dividing all bus transfers 
into a set of one or more bus transacuons* Each bus transaction consists of the maximum 
amoum of informauon that can be transferred within a single bus cycle. The bus cycle is 
defined by the CLOCK' signals. During a given bm cycle there is enough time to pass an 
address and/or data from one device to another. Devices are not allowed to hold the bus 
for memory access times. The full bandwidth of the bus is availablje for pansf erring infor- 
mation from device to device and is not impacted by a slow dexice on the bus. 

The 64-bit wide X*Bus connects several heterogeneous and/or homogeneous processors, 
memory systems, and interfaces withm the Series 10000 system. Each device on the bus 
uses the bus for all communications with other devices. The commimications operations 
include data transfers, requests for data transfers, intemipts. and TB invalidate operations. 

The X-Bus has a 64-bit wide data path with several devices connected to it. Each device 
on the bus has a imique device ID. which is used as part of the address selection mecha- 
nism during certain t>pes of commands. Any device on the bus may arbitrate for the bus, 
become master on the bus. and send a command to any other device on the bus. 
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These commands occupy ihe bus for a single cycle, although the command operauons may 
occupy ihe bus for several cycles. WRITE transfers both address and data informauon 
while RE.\D transfers only address informauon. The device receiving these commands re- 
sponds with an acknowiedcmeni and completes the requested operauon. During read op- 
erations, the receiving de\ice cams access to the X-Bus and initiates a READ RESPONSE 
command to the requestmc device when the data is available. After the the READ or 
READ MULT command iniuation. and before the READ RESPONSE* the bus is available 
for other bus transacuons. 



2.2 Conventions 

This section explains the X-Bus register conventions. 



2.2.1 Dau Formats 

The basic data struaures used are the b>te (8 bits), word (16 bits), long word (32 bits), 
and the quad word (64 bits). The least significant bit of each of these data structures is bit 
0, the most significant bit is bit N-1 (where N is the number of bits in the data structured. 
The most significant bne m each struaure is byte 0. the least significant b>te is byte M-1 
(where M is the ntunber of bytes in the data structure). 



BYTE 



Byte 0 



LONGWORD 



QUADWORD 













1S 




a 


7 




0 










WORD 


Byte 0 


Byte 1 


31 




24 23 


16 


IS 




8 


7 




0 


Byte 0 


Byte 1 


Byte 2 


Byte 3 


64 




56 55 


48 


47 




40 


39 




32 


Byte 0 


Byte 1 


Byte 2 


Byte 3 


31 




24 23 


16 


18 




8 


7 




0 


Byte 4 


Byte 5 


Byte 6 


Byte 7 



Figure 2-/. Data Formats 
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2.2.2 Parity 



The pamy on the X-Bus daia fields i? odd paniy u.e., me total number o: one? in the 
data field, inciuding its associated parity bit. equal an odd number where a one ib true 
high) unless specified otherwise. 



2.3 X-Bus Signal Definitions 



Subsecttons 2.3.1 through 2.3.3.22 contain descnpuons of the X-Bus signals. 



NOTICE: 



An asterisk C) after a signal name indicates that the signal is 
active-low (true). States in this manual refer to the state of the 
signal on the backplane. Most drivers/receivers invert the signal 
so It is seen on the backplane as Signal.Name 



2.3.1 X-Bus Address 



Path Signals 



The X-Bus address path contains the following elements: 



• 2S^it Physical Address - ADR' (29:2)/DAT' (61:34) 

• 4-bit Valid Byte field - VALDBYT*(3:0)/DATM63.62,33.32) 

• 7-bit Virtual Address Page Number — VPN* (6:0) 

• 4-bit Device ID — ID* (3:0) 

• 2-bit Subdevice ID - SUBID«(1:0) 



2.3.1.1 ADR* 

The ADR* field is used to select a device on the X-Bus and to specify an offset within 
that device's address space. The address contained in ADR' is a 30-bit longword address 
that is right justified in the field. Data elements that are only a portion of a 32-bit word 
are specified via the VALDBYT' field. Writes on the X-Bus consist of a write address and 
a 32-bit data transfer, or a write address followed by one or more 64-bit data transfers. 
Reads may involve multiple 64-bit data transfers or a single 32-bit transfer. 

A WRITE MULT transfer must start and end on an even boundary. A READ MULT 
transfer may start and end on an odd boimdary. A transfer that starts on a 32-bit botmd- 
ary, but not a 64-bit boundary, is indicated by ADR' (2) = 1 in a MULT transfer. A 
WRITE or READ MULT transfer may end by writing only the most-significant 32 bits of a 
64-bit word, if ADR' (2) = 1 and the number of 32-bit words to be transferred is even. A 
READ MULT transfer may only end this way. if ADR' (2) = 0 and the number of 32-bit „ 
words to be transferred is odd. ADR* is shared with the most-significant bits of the data 
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paih DAT'f63:32*- Tne paruy bus DATPfO:3) musi he vaiid whenever ine ADR* iieid i? 
assened. 



2.3.1.2 Addressing 

Excepi for commands such as READ RESPONSE and INVALIDATE, de\nces deiermme 
whether a bus transacuon is directed to ii based on ADR(29;22^. Devices such as memory 
controllers. i/O interfaces, and graphics controllers have a set of programmable registers, 
accessible via the Diagnosuc Bus (D_Bu5K which determme ihe address ranee of messages 
directed to it. Processor devices reside in the address range 0. If ADRf29:22) are zero, 
devices must compare ADR(21:18) with their Device ID lo determme if the transaction is 
directed to it. This is the mechanism for addressing system level control and status regis* 
lers. and processors. Memory controllers must respond to both an address ranee and an 
ID-directed address. If the command for the transacuon is a READ RESPONSE, devices 
must compare their ID uith the ID field to determme if the transaction is for it. 

General X-Bus Addressing 



31 30 29 



22 21 



02 01 00 



Address Range 



Address Offset 



Figure^ 2-2. General X-Bus Addressing 



Table 2^L Address Space 



Address Range 


Size 


Device 


00.000.000 - 


00.3FF,FFF 


4 MB 


Control Registers 


00.400*000 - 


07,FFF.FFF 


124 MB 


Reserved (Disk and Network Controllers) 


08.000,000 - 


PF,FFF,FFF 


128 MB 


Service Processor 


10,000.000 - 


17.FFF,FFF 


128 MB 


Graphics Processor 1 


18,000.000 - 


1F.FFF,FFF 


128 MB 


Graphics Processor 2 


20.000.000 - 


2F,FFF.FFF 


256 MB 


Memory Controller 1 


30.000.000 • 


3F,FFF,FFF 


256 MB 


Memory Controller 2 



1-4 X-Bus 
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X-Bus Control Register Addressing 



31 30 29 Z2 21 18 17 

OOOOOOOol Device ID | Address Offset 



Figure 2-3. X-Bus Controi /Status Register Addressing 



Table 2^2. Control Register Address Space 



Device ID 


CSR Range 


Device 


0 


000.000 - 03F,FFF 


CPU 0 


1 


040,000 - 07F.FFF 


CPU I 


^ 


080,000 - OBF.FFF 


CPU 2 


3 


OCO.OOO - OFF.FFF 


CPU 3 


4 


100,000 - 13F.FFF 


Unissed 


5 


140.000 - 17F.FFF 


Graphics Processor 1 


6 


180.000 - IBF.FFF 


Graphics Processor 2 


7 


ICO.OOO - 1FF,FFF 


Reserved 


8 


200,000 - 23F.FFF 


Disk Controller 1 


9 


240.000 - 27F,FFF 


Disk Controller 2 


A 


280.000 - 2BF,FFF 


Network Controller 


B 


2C0.000 - 2FF.FFF 


Service Processor * 


C 


300.000 - 33F.FFF 


Reserved 


D 


340.000 - 37F.FFF 


Memory Controller 1 


E 


380.000 - 3BF.FFF 


Memory Controller 2 


.... . y ^- 


* 3Cd.OOO--^3PP.'fl* 


' ^'^©nusetf-^'^^- •••^ 



The Service Processor is assigned a device ID of *B'. but it does not 
use the Specified CSR range. 



02 01 00 



2,3.1.3 VALDBVT* 

VALDBYT* is used only on READ or WRITE commands to indicate which bytes within 
the 32*bit word addressed by ADR* should be wriaen or read. On a write operation. 
VALDBYT'(3) « 1 indicates a write to bits 24 through 31 (most-significant byte) of the 
32-bit word. VALDBYT* (0) = I indicates a write to bits 0 through 7 (least-significant 
byte) of the 32-bit word. On read operations. VALDBYT* has significance only when 
reading a location has some side effects (that is. reading a control register in an I/O device 
controller, or causing a halfword or byte transfer on the VMEbus) . One of the 
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V'ALDB^T' bits being set or not set does not deiermme wheiiier the daia ior that parucu- 
tar locauon is placed on ihe X-Bu5. 

Because VALDBYT* is omv sicnificam when an aadress i? on ihe X-Bus. ii shares ihe 
DAT' field vnih ADR' when ADR' is valid on the bus. 



2,3.1.4 VPN* (Virtual Page Number) 

VPN' is used to maintain cache coherency among the various processors in the system. 
ADR* represents a physical address, while the processor caches are virtuaily indexed. The 
Virtual Page Number (VPN') pan of the \anual address is placed on the X-Bus along uiih 
the physical address: This provides the information needed to invalidate entries in the 
caches during a write over the X-Bus. The cache logic monitors this field along with the 
ADR* and ID* fields to determine if there is a cache hit. If there is a cache hit, and it is 
caused by another dexnce uiiting into that locauon, the cache logic has to invalidate or up- 
date that locauon m its cache. 

The logic that handles read data for caches must also pay attention to this field. Writes to 
a locauon that has a read pending must flag the read so that it does not appear in the 
cache as a valid entr>'. VPN* is actually bits 18 through 12 of the virtual address (the xir- 
tual page number). Although the VPN' is used only during X^Bus write operations, it is 
permissabie to place information in this field during all X-Bus operations. 

The VPN* information is conveyed to the I/O interfaces as part of the I/O mapping tables 
setup, prior to the initiation of an I/O transfer. If the new virtual-to-physical mapping is 
not known when these tables are set up, the previous virtual mapping of the page is used 
in the VPN*. This causes any cache entries for the old mapping to be invalidated while the 
I/O transfer is in progress. For certain operations (that is. the Service Processor modifying 
a portion of memory), the relevant VPS' information is not known. Therefore. memor\- 
modification by the SP must be handled very carefully. All the processors must be brought 
to an idle state before the SP can make iiis transfer. After the SP transfer, all the caches 
must be invalidated. 
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2.3.1.5 ID* 

The ID* field has two mam uses. Dunne mosi commands u is used lo idenuiv the device 
that has the bus. Dunne the READ RESPONSE command ii is used to idenuly ihe de\nce 
ihai 15 10 receive the aaia. Each device on me X-Bus assigned a unique ID. %aa a De- 
vice ID field, that is loaded at system miuauzacion. Each device places its Device ID into 
the ID. field when it is accessmc the X-Bus. It is. however, possible thai another dexice's 
Device ID can be placed there under special circumstances (that is, X-Bus diagnosuc test- 
ing;. It may also be possible for one device to issue a read request and direct the data to 
another device by putung the other device's Device ID in the ID* field when issuing the 
request. In this mode of operauon. the other device must be ready to accept the incommg 
data. 



2.3.1.6 SUBID* 

The SUBID* field is used to distinguish between two or more pending read operations 
from a given device. Because read operations can take several c>'cles to complete, a device 
may issue several read operations before any data is returned. Depending on the imple- 
mentauon of the memory controller, the data may not be returned in the order in which 
the reads were initiated. Therefore, the initiadng device assigns a unique SUBID* field to 
each read operauon that is initiated. The slave device returns the SUBID* field when it re* 
turns its data. The dexnce uses this field to identify which request the response is satisfying. 
This mechanism is useful in the case of a processor board that has independent Instruction 
and Data caches, where each could make its own read request to memory. 
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X-Bus Data Path Signals 

The X-Bus caia pain contains the followinc elements: 

• 64-bii Data Path — DAT* f 63:0) 

• S-bu Data Parity field — DATP'(7:0) 

2.3.2.1 DAT- 

During commands such as READ or WRITE. DAT* (63:32) is used as the ADR* path. 
Dunng WRITE commands. DAT* (3 1:0) is used to transfer the data that is to be wnuen. 
During READ RESPONSE commands, DAT* (63:0) is used to transfer the dau. 

2.3.2.2 DATP* 

The parity bits DATP'(7:0) are associated with the DAT* bits in the following manner: 

• DATP*(0) is the paniy bit used with DAT* (63:56) 

• DATP'd) is the panty bit used uith D.AT* (55:48) 

• DATP*(2) is the parity bit used with DAT* (47:40) 

• DATP*(3) is the parity bit used with DAT* (39:32) 

• DATP*(4) is the parit\- bit used with DAT* (31:24) 

• DATP*(5) is the parity bit used with DAT* (23: 16) 

• DATP*(6) is the parity bit used with DAT* (15:8) 

• DATP*(7) is the parity bit used with DAT* (7:0). 

Parity bits should be valid for all bytes in the transfer, regardless of whether the data will 
'actually be used. 
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2.3.3 X-Bus Control Path Signals 

The X-Bus control path contain.*: ihe folloviing eiemenn: 



• 


5-bii Command field — CMD'(4:0j 


• 


2-bii Acknowiedce field — aCK*(1:0) 


• 


Inhibit Arbitrauon signal — INH^ARB* 


• 


Bus Lock signal — LOCK* 


• 


X-Bus Lock Request — X^LOCK^REQ' 


• 


12 Bus Request lines BUSREQ'(11:0) 


• 


Reject signal — REJ* 


• 


Reset signal — RESET* 


• 


Eight Clock lines — CLOCK' (7:0) 


• 


One Bus Request Summary line — BUSREQ_SUM' 



2.3»3.1 CMD* 

The CMD* field is driven by the device that is master on the X-Bus during any given cy- 
cle. Typical commands are READ. WRITE. READ RESPONSE, etc. 

There are a set of commands that specify a bus broadcast mode where one device may 
send a command to all other devices or some subset of devices on ihe bus. If the two 
most-significant bits of the command field are set, the command is a broadcast command 
and aU devices on the X-Bus accept the command. The bits in the ADR* field could be 
used as a mask field to indicate that only a subset of the devices on the X-Bus can pay at- 
tention to the current broadcast command. The implementation of this feature requires the 
following special considerations: 

• The acknowledge phase of the transfer must be inhibited. 

• Each device that is capable of receiving such a transfer must be able to uncondi- 
donally accept such a transfer. 

The acknowledge phase must be inhibited to avoid unpredictable results if several devices 
were trying to acknowledge a transfer at the same time. If a transfer that requires a posi- 
tive action (that is. invalidating a TB entry) is sent to a deyice, and the device is not capa- 
ble of accepting the transfer, system damage may result because the master device does not 
get any indicauon that the transfer is not successful. 



- 104 - 



X-Bus 2-9 



EP 0 366 434 A2 



Apollo Prelinmnary and Confidential 



Table 2-3 shows the command codes used in ihe CMD* fieid. 

Table 2-3 rX- Bus Command Code Descnpuons 



Code on 






Backplane i 


Command 


Description 


00000 


WRITE 


Write a word of data to the desunauon device 


01000 


READ 


Initiate a read operation on destination device, single word 


ouu 


READ RESPONSE 


Return of longword as a result of a prior READ operation 


00111 


WRITE DATA 


Data transfer associated with a WRITE MULT command 


00100 


WRITE MULT 


Initiate a wnte operauon of one or more 32-bit iongwords 


01100 


READ MULT 


Initiate a read operation on desunauon device. Iongwords 


OHIO 


READ RESP ERR 


Return of longword with an uncorrectable error 


llxxx 


broadcast command 


A command that is sent to ail devices on the X-Bus 


1 1100 


INVALIDATE TB 


Invalidate ail TB entries fBroadcast.) 


11110 


INVAL TB SEL 


Invalidate selected TB enir\' (Broadcast) 


11111 


NOP 


No Operation 



2.3.3.2 ACK* 

The ACK' fieid is driven in cycle N+2 by the slave device in cycie N. Typical responses 
are ERROR, COMMAND ACCEPTED, BUSY* and NO RESPONSE. NO RESPONSE is 
the response for a nonexistent de\'ice. 

Table 2*4 lists the ACK* field response codes. 

Table 2^4, ACK* Field Response Codes 



Code on 






Backplane 


Command 


Description 


00 


ERROR 


Parity error or command reject on previous 






transmission 


01 


BUSY 


Destination device is not available to accept command 


10 


CMD* ACCEPTED 


Positive acknowledgement 


11 


NO RESPONSE 


No device has responded 



Acknowledging BUSY signifies that the destinauon device is currendy not able to accept 
the transaction but should be available to accept it soon. Unfortunately, the length of time 
that a device is busy can vary from device to device. It is a function of the command it is 
currently processing and the amoimt of X-Bus traffic. The VMEbus interface, for instance. 



2-10 X~Bus 



--105 - 



EP 0 366 434 A2 



ApoUo Preliminary and Conftd ntiai 



could be busv for several hundred miUiseconds. depending on ihe response time of the 
VMEbus aevice with wnich ii is communicaunu. 

If a device receives a BL'SV acknowieacemem lo a given iransacuon. ii remes ihe transac* 
lion. 



Z.3.3.3 BUSREQ 

One BUSREQ line is assigned to each device on the bus. Each device looks at the BUS- 
REQ lines from the devices of higher pnority on the bus. as well as its own. Each device is 
given a specific pnoriiy level. Whenever a device warns access to the bus, it asserts its 
BUSREQ line. During any cycle in which the INH^ARB* line is not asserted, each device 
that requests the bus looks at ail the BUSREQ lines of higher pnority. If none are active, it 
cakes control of the X-Bus in the next cycle. When it takes control of the bus, the device 
deassens its BUSREQ line and asserts INH_ARB* if it is coinc to hold onto the bus for 
more than one cycle. 

Devices such as memory controllers are the highest pnority devices on the X-Bus because 
it is important for them to empty their queues before further commands can be issued to 
them. I/O interface controllers are the next highest priority, thus preventing overrun con- 
dhons in time'-criticai devices. Processors are lowest priority on the bus. 



2.3.3.4 BUSREQ^SUM* 

The BUSREQ_SUM* line represents the logical OR of the BUSREQs from the Class A 
X~Bus devices. This means that the Class B devices do not have to look at ail 11 other 
BUSREQs in the arbitration process. They only have to look at BUSREQ^SUM* and the 
three other Class B BUSREQs. The Class A devices have to look at oixly seven other BUS- 
REQs because they have higher priority than the Class 3 devices. The Class A devices gen- 
erate BUSREQ^SUM* by driving this line low (signal is asserted true low) whenever they 
assen their BUSREQ signal. BUSR£Q_SUM* is asserted whenever any Class A devices re- 
quest the X-Bus. 



2.3.3.5 INH^ARB* 

The Inhibit Arbitration (INH^ARB*) signal is asserted true low. It is pulled up on the 
backplane so that its idle state is deassened. The device that is master on the X-Bus may 
assert the signal to ensure that it maintains mastership of the bus for additional cycles. Ar- 
bitration for mastership of the bus proceeds whenever the INH_ARB' line is deassened. 
The master holds this signal asserted for as long as it wants to hold the bus, typically not 
more than a few cycles. If the device is only going to use the bus for a single cycle, it 
shotild not assen INH_ARB* at all so that 'bus arbitration may proceed In parallel with its 
transfer. 
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2.3.3.6 LOCK* 

The bus lock is the men pnmiuve synchronizing method in the system, li is a smde 
sysiem-wide resotirce whose ownership is enforced by the backplane bus protocoL Instruc- 
tions thai reference memor>' can request the acquisition or release of the bus lock as pan 
of the reference. When one processor holds the bus lock, any attempt by another proces- 
sor to also acquire the bus lock results in the other processor stalling. Nonlocking memory 
operauons by any processor are unaffeaed. The bus lock also plays a crucial role in assur* 
ing program sequeniialiiy because that program's behavior is visible to a second processor. 

The bus lock should be held for short time durations only. Extended holding of the bus 
lock may hinder multiprocessor performance. There is a umer maintamed in the bus inter* 
face to limit the duration that a lock may be held to about 200 microseconds. 

The processor's bus interface is designed to implement a lock acqtiisition fairness algo- 
rithm. The interface guarantees that every processor that requests the bus lock has an op- 
portunity to secure It before any of the processors may reacquire the lock a second time. 
For this reason, the software does not require any minimum wait period between bus lock 
acquisitions that would otheru-ise be necessary to avoid lock starvauon. 

The bus lock can be acquired only by the load lock instruction. The lock is released imder 
program control by either a load tmlock or a score unlock instruction. The bus lock is also 
released in the hardware as a side effect of trap invocation. The btis lock may also be ac- 
quired and released in tiser mode. 

Only one processor can hold the bus lock at a given time. Thus, the lock provides the ba- 
sis for mutual exclusion. Holding a bus lock stops other processors from seizing it, but does 
not interfere with any other memory system activity. In particular, bus locks do not slow 
down either DMA or noninterlocked processor reads and writes. 

Securing a bus lock guarantees that all memory stores prior to the load lock instruction 
have reached main memory. The load lock that fltishes out any buffered writes gtiarantees 
that iruerlocked code can asstime all instructions prior to the inteiiocked sequence exe-^ 
cuted without error. A second benefit of this is that the duration of time that a bus lock 
must be held is shortened. 



2.3.3.7 LOCK^REQ* 

X-Bus lock request is used by a processor requesting a bus lock. This signal is held in the ' 
lock protocol fairness arbitration scheme described in Subsection 2.3.3.6. Bus lock request 
ensures that a processor requesting a bus lock gets it before the current bus lock owner re- 
ceives a second one. 



2.3.3.8 REJ* 

The R£J* line is used to invalidat the transfer in the previous cycle. REJ" in cycle NVl is 
used when a pr cessor wants to issu another write operation before it receives the ac- 
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knowledge from us prexnous wme operation fu sull cannoi execute vrrnes m consecutive cy- 
cles ». The processor actually issues the second write in the same cycle that it is receiving 
the acknowledge irom the first wnte and then asserts or deasserts'the REJ* signal in the 
cycle thai follows the second wnie, based on whether it received a busy or positive ack- 
nowledgement to the first write. If the first uTite was not accepted, the second write is can* 
celled, presemng vrrxie order. 



CUK 



ACK* 
REJ* 



)(]^^^ X NACK2 I 



This write is cancelled by RE J.' 
This write is cancelled by NACK1 . 



Figure 2^4, Timing on Consecutive Writes from Same Device with the First Write NACKed 



2.3-3.9 RESET* 

RESET* is a synchronous signal that is used to initialize all processors in the system. Using 
this signal ensures that important machine state information is not destroyed (that is, as 
might happen if the memory refresh circuitry were arbitrarily reset while a refresh is in pro- 
gress). RESET' is held asserted at power-up time to ensure that there are no tri-state 
clashes on buses while interfaces are being iniualized. 



2.3.3.10 BUSRESET* 

BUSRESET* is a synchronous signal that is used to clear a problem on the X-Bus. A btis 
monitor continually checks the btis to make sure that the INH^ARB* line is not asserted 
for an abnormally long period of time; if this condition occurs, BUSRESET* is asserted to 
remove the condition. Devices must use this signal to reset as much logic as necessary to 
remove an abnom:ial bus condiuon. BUSRESET* is asserted at power-up time to ensure 
that there are..no tri-state clashes on the bus while interfaces are being iniUalia»d. 
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Z.3.3.11 CLOCK* 

One CLOCK* line goes lo each device on ihe X-Bus. Each CLOCK* line cames a 50*r 
duly cycle signai thai runs at tne X-Bus irequency. All CLOCK* lines are arranged lo 
minimize the skew between them ai ihe device's backplane sioi. The loads, routing, and 
terminations on each device are carefully controlled so that each CLOCK* line sees identi- 
cal loading. The CLOCK* signal is used as a reference input to the phase locked loop 
(PLL) on each device. The PLL, m coniuncuon with the SCR, generates a set of clocks 
that are synchronous and in phase with the docks on all other devices* and run at the rate 
of the X-Bus. 

2.3.3.12 ACLO 

ACLO is a signal from the power controller that indicates the ac power is not within sped« 
ficauons. On a power-up situauon, this signal will not be deassened until the stability of 
the power source is ensured. On a power-dou'n situation. ACLO will be asserted at least 
5 msec before dc power becomes out of specifications. On power«up, the RESET* and 
BUS RESET signals are held asserted until at least 200 msec after ACLO is deassened. 

2.3.3.13 SHUT^SW 

SHXJT^SW is sent from the system's front panel to the SP on the Utility board. The SP 
sends SHUT_CMD to the power subsystem to shut down the system's power. 

2.3.3.14 SHUT^CMD 

SHL^^CMD is sent from the SP to the power supply to initialize a power shut down se« 
quence. 

2.3.3.15 TEMPI 

TEMPI is sent to the power supply when the temperature sensing circuitry senses that the 
temperature at certain checkpoints has reached a dangerotis level. This signals the power 
supply to shut doMH, prevenung possible system damage. 

2.3.3.16 TO.SCR 

TO_SCR is the line used to initialize conununicattons from each X-Bus board to the SCR. 

2.3.3.17 FM.SCR 

FM_SCR is the line used to initialize communications from the SCR to each X-Bus board. 
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2.3.3.18 REQ^OUT 

REQ_OUT is ihe arbiirauon line used by each X-Bus board lo request use ot ihe X-Bus. 

2.3.3.19 +12V 

+ 12V is ihe +12 voiis signai from the PSE to the X-Bus comroliers. 

2.3.3.20 •12V 

-12V is the -12 volis signal from ihe PSE to the X-Bus controllers. 

2.3.3.21 -SV 

-5V is the -5 volts signal from the PSE to the X-Bus controllers. 

2.3.3.22 +5V 

+5V is the +S volts signal from the PSE to the X-Bus controllers. 
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2.4 X-Bus Arbitration 

All X-Bus mierfaces except the default ouTier must request the bus prior to use. There is 
one bus request level on the backplane for each X-Bus dexnce. Devices are grouped inio 
ivi-o classes. Class A devices are au'arded the bus in sma priority order. Class B devices 
panicipaie in fair arbitrauon and may also be default bus oumers. CPUs are class B de* 
vices. 

Bus arbitration is decentralized. Every bus interface decides for itself whether it has gained 
access to the X-Bus. Bus arbitrauon can be inhibited by asserting the ARB_IN'HIBIT back* 
plane signal. Only the current owner of the bus may assert this signal. The current owner 
does so if the intended bus transfer requires multiple cycles. 

2.4: 1 Class A Request Override 

To request the bus. a Class A device asserts both its assigned request level and the bus 
request sum line on the bus. W'hen the BIF detects the bus request sum assemon in an 
active bus arbitrauon cycle, it defers to the class A device(s). 

2.4.2 Class B/CFU Requesting 

The Class B devices, the four CPUs also have fbced priority assignments. Priority assign- 
ments are 0 through 3. with 3 being the iiighest priority. The assignment is scanned into 
the BIF and used to determine which of the four Class B request parallel backplane signals 
each CPU uses. The CPU drives its assigned level, and defers to requestors at higher lev- 
els. 

Class B devices exercise fair arbitrauon. and don't reassen their request lines on demand. 
Instead. Class B devices snapshot all other lower priority Class B request lines during the 
final cycle of a bus ownership* The Cass B device then relinquishes the bus and doesn't 
reassert a request line until all the snapshoned requests are s^cisC^ TheTlass B devices 
observe the current state of the other request lines to determine that the other requestors 
have been serviced. When a request line is deasserted. service is underway or completed. 
If a request line is still asserted, but arbitration is enabled, that requestor wins and service 
resumes. 

2.4.3 Default Ownership 

When the bus is otherwise idle, the last successful bidder among the Class B requestors 
remains as the default bus owner. The default bus owner may use the btss at the end of 
any cycle during which no other request line was assened. The default bus owner does not 
have to assert its request line. The default remains in effect until another Class-B device 
wins the bus. 
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A Class B device's bus ouiiership may be suspended by a Class A device. If a Class A de- 
\ice assumes conu-oi of the bus, the former Class B o^^'ne^ device waits for .the bus lo be- 
come idle again before reciaimmc bus ou-nership (i.e., tlie Class B device rezssuxnes owner- 
ship in the cycle follownc one during which arbitration was permuted, but does not assen 
Its request line). If anotner Class B device v^ins the bus before it becomes idle, default bus 
ownership transfers to the latest Class 5 bus owner. 

2.4.4 Acquisition Timeout 

When a BIF first asserts a bus request Une, it starts a timer. If the timer elapses before the 
bus is acquired, a bus acqutsttion timeout occtirs. The bus umeouc duration is approxi- 
mately 3.2 milliseconds (16-bit counter). If a timeout occurs, the system is assumed bro- 
ken and a clock freeze request is made of the SCR. The internal BIF sute is preserved as 
. much as possible. 

The timer is not stopped until either a NOACK' or ACK* signal is received for the request 
address transfer. The timer, therefore, expires if a device is continually busy. Broadcast 
transfers, such as TB invalidates, stop the timer regardless of the acknowledge line state. 

2.4.5 Local Request Prioritization 

Three competing local requestors are internal to the BIF. They include data cache read, 
data cache write, and instruction cache read. Dau cache read is prioritized over instruction 
cache read. In turn, instruction cache read is prioritized over data cache write. The follow- 
ing list contains exceptions to these rules: 

• If the wnte data queue is full, dau cache write is prioritized over an ixistruction 
cache miss. 

• If a data cache miss collides in address with a previously queued write, data cache 
write is given priority over both data and* instruction cache miss. 

• If a uTite to an unencacheable memory location is queued, data cache ^K>Tite is 
given priority over both data and instruction cache misses. 

• If a write and unlock is queued, data cache write is given priority over both data 
and instruction cache misses. 

• If a data cache miss from an tmencacheable memory location is posted, ds^ta 
cache write is given priority over both data and instruction cache misses. 

• If a data cache miss and lock is posted, data cache write is given priority over 
both dau and Instruction cache reads. 

• If a data cache miss and unlock is posted, dau cache write is given priority over 
both data and instruction cache reads. 
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• If a tb invalidate is queued in the wnie buffer, dau cache wrue is given priority 
over boih insirucuon and data cache -mTsses. 

A4ocaUy generated READ RESPONSE required for a BIF CSR read is given precedence 
over ail other transmitters. 

2.4.6 Subsequent Request Arbitration Delay 

The BIF issues subsequent requests from the data cache every other bus cycie (or later). 
This assures write order between processors, and read-wnte order within one processor. 
The instruction cache miss request is not restricted to every other cycle. For load and lock* 
load and unlock, and store and unlock* subsequent requests are not issued until a success* 
ful bus acknowledge of the prior request is received. 

The BIF issues subsequent requests from a CPU every other bus cycle (or later). This as- 
sures write order. For load and lock, load and unlock, and store and unlock, subsequent 
requests are not issued until a successful bus acknowledge of the prior request is received. 

A fair arbitradon scheme is used among the processors, while a strict priority scheme si 
used for other X-Bus devices and the processor group. The memory controllers always 
have a higher priority than the processors, but one processor cannot lock out the other 
processors because of heavy X-Bus requirements. 



2.4.6.1 Implementing Fair Arbitration 

Implementating arbitration for the processor group is different from implementatinc arbitra- 
tion for the other X-Bus devices because of the need for "fair** arbitration and because of 
the need to optimize their X-Bus access latency. Because the processors are both the low- 
est priority devices and the most frequent X*Bus users, they are the default owners of the 
bus. 

For purposes of implementing fair arbitration, bus request lines are divided into two 
classes: A and B. An X-Bus device may only be in oxie cTass. Its relative priority posinon 
in that class is established by information which is scanned into the device at system in- 
itialization time. 



2.4.6.2 Class A Request Override 

To request the bus, a Class A device asserts both its assigned request level and the bus 
request sum line on the bus. When the BIF detects the bus request sum assertion in an 
active bt2S arbitradon cycle, it-defers to the class A device(s). 
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1.4.6.3 Class B/CPU Requesting 

The Class B devices, the four CPUs also have fixed pnoniy -.assicnmenis. Priomy assicn- 
ments^e 0 chrough 3, wiih 3 being the highest pnoniy. The assignmeni is scanned mio 
ihe BIF and used lo determine which of the four Class B request parallel backplane signals 
each CPU uses. The CPU drives its assigned level, and defers to requestors at higher lev* 
els. 

Class B devices exercise fair arbitrauon, and don't reassen their request lines on demand. 
Instead. Class B devices snapshot all other lower priority Class B request lines during the 
final cycle of a bus ownership. The Class B device then relinquishes che bus and doesn't 
reassert a request line until all the snapshoued requests are satisfied. The class B devices 
observe the current state of the other request lines to determme that the other requestors 
have been ser\iced. When a request line is deassened. service is underway or completed. 
If a request line is still assened, but arbitration is enabled, that requestor wins and service 
resumes. 



2.4.6.4 Default Ownership 

When the bus is otherwise idle, the last successful bidder among the Class B requestors 
remains as the default bus owner. The default bus owner may use the bus at the end of 
any cycle during which no ocher request line was asserted. The default bus owner does not 
have to assert its request line. The default remains in effect until another Class B device 
wins the bus. 

A Class B device's bus ownership may be suspended by a Class A device. If a Class A de- 
vice assumes control of the bus, the former Class B owner device waics for the bus to be- 
come idle again before reclaiming bus ownership (i.e., the Class B device reassumes owner* 
ship in the cycle following one during which arbitration was permined, but does not assert 
its request line). If another Class B device wins the bus before it becomes idle, de&uli bus 
ownership transfers to the latest Class B bus owner. 



2.5 Command Formats 

Except for the NOP command, correct parity must be maintained on the DAT* field (in 
some cases labeled ADR) at all times. The sample commands in the following subsections 
are shown as being initiated by a device with an ID of 0x05 and a SUBID of 0x03. 

NOTE: AU fields and notes in the following illustranons are shown in 
backplane polarity. 
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2.S.1 Write 



CMD 

4 0 



r 



0 0 0 0 0 



ADR 



10 

3 0^ 

10 10 



VPN 



Virtual Adoress 
(18:12) 



SUBtO 

1 0 

0 0 



Valdbvt 

3 0 



0 0 0 0 



«vrite tata 31 thru 

write bits 23 ttiru 16 

write tilts 15 thru 08 

write bits 07 thru 00 



31 30 


29 




02 


01 00 


VALDBYrt3:2] 


Destination Address (PA(23:2]) 


VAl-DBYTll :01 


DAT 

31 








00 


Data to Be Written (DATA (31:01) 



2.5.2 Read 
CMD 



0 0 0 0 1 



ADR 
31 



Figure 2-5. X^Bus WRITE Command Example 



ID 

3 0 

1 0 1 0 



VPN 



30 29 



Virtual 

(istiav 



SUBID 

1 0 



0 0 



VALDBYT 
3 0 



0000 



Read BfU 31 thru 24 
Raad Bits 23 thru 16 
Read Bits 15 thru 06 
Read Bite 07 thru 00 



02 01 



00 



VALDBVTI3:21 



Destination Address (PA(^:2]) 



VAUDBYTtltO] 




Figure 2-6, X-Bus READ Command Example 
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2.5.3 Read Response 



CMD 

A 0 



! 0 1 1 1 1 



ID 

3 0 



Returnea I 
as Sent | 



VPN 



SUBID 

1 0 



Not Usee 



Returned I 
as Sent I 



ADR/DAT 

63 



32 



Most Significant 32 Bits of Data 



DAT 

31 



00 



l^ast Significant 32 Bits of Data 



Figure 2^-7. X-Bus READ RESPONSE Command Example 



2.5.4 Write Data 



CMD 

4 0 



0 0 111 



ID 

3 0 

1 0 1 0 I 



VPN 



Virtual Address 
(18:121 



SUBID 

1 0 

0 0 



ADR/DAT 
63 



32 



Most Significant 32 Bits of Data 



DAT 

31 



00 



Least Significant 32 Bits of Data 



Figure 2-^8. X^Bus WRITE DATA Command Example 
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2.5.5 Write Mult 



CMO ID' 
^ 0 3 0 



0 0 1 0 0 I 



10 10 



VPN 

6 0 



SUBip 

0 



Vinuai Address 
118:121 



0 0 I 



ADR 



1 1 . ^ 34 33 32 


1 1 ! 


Physical Address 




DAT 






31 




0 


j Firsi 32 Bits of Data if Address is an Odd Longwona (32-Bit) Address | 



D 


Direction 


1 


Ascending 


0 


Oecending 



Figure 2-9, X-Bus^WRITE MULT Command Example 
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2^.6 Read Mult 

. CMD 



ID ^ 
3 0 



. VPN 
6 0 



0 110 0 



1 0 1 o| 



Vinuai Adoress 
(18:121 



SUBtD 
1 0 

0 01 



ADR 



63 


62 


61 






34 


33 


32 


L 


L 


Physical Address 


W 


E 


DAT 

31 






OS 


07 




00 












Longword Count 



1 1 TRANSFER LENGTH =s 2 LONGWORDS 
10 TRANSFER LENGTH a 4 LONGWORDS 
01 TRANSFER LENGTH s 8 LONGWORDS 
00 TRANSFER LENGTH « 18 LONGWORDS 



E I COUNT ENABLE 



1 I Count m DATA (7: 01 
TT Count defined by LL 



w 


Address Wrap 


1 


No Wrap 


0 


Modulo Wrap 



Figure 2^10. X^Bus READ MULT Command Example 
2.5.7 Read Response Error 



CMD 

4 0 



0 1110 



to 

3 0 




VPN 

6 0 




SUBID 

1 0 




Returned 




Not Used 




Returned 
. as Sent 




DAT 

31 



00 



Figure 2^11. X~Bus READ RESPONSE Command Example 
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The READ RESPONSE ERROR command is sent back lo the requesung device in place of 
a READ RESPONSE commana when a condiuon arises during ihe read operaiion thai pre* 
vents 'It from bemg compieied. The READ RESPONSE ERROR terminates tne read opera- 
tion The read operauon is considered to be fulfilled and subsequent data in a READ 
MULT operauon is discarded. 



2.5.8 Invalidate IB 



CMC 

4 0 



1110 0 



3 0 



10 10 



VPN 



SUBID 

1 0 



Not Used 



Not Used 



ADR 

31 



00 



DAT 

31 00 



Figure 2^12, X^Bus INVALIDATE TB Command Format 



2.5.9 Invalidate TB Sel 



CMD 

4 0 


to 

3 0 


VPN 

6 0 




SUBID 
1 0 




11110 


I 1 0 1 0 1 


Not Used 




Not Used 




ADR 

31 








12 


00 


Virtual Address to be Invalidated 




OAT 

31 










00 









Figure 2^13. X^Bus INVALIDATE TB SEL Command Format 
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2.5,10 NOP 

CMO 

_J 0 

I 1 1 1 1 : 



••ID- 

3 0 



VPN SUBtD 

0 1 0 



Not Used I 



Not Usee 



Not useo 



ADR 

31 



00 



DAT 

31 00 



Not Used;; 



Figure 2-!4, X-^Bus NOP Command Format 



2.6 Write Sequences 



The following subsections describe the transactions that take place during write and write 
tnuidple btss transfers. 



2.6.1 VVBIIE (Single 32-Bit Write) 

A write sequence on the X-Bus consists of four phases: request, arbitration, transfer, and 
acknowledge. Some of these phases may happen in the same bus cycle. All may be over- 
lapped with some phases from other transfers. During the request phase a device asserts its 
BUSREQ line to indicate to all other devices that it wanes to gain access to the bus. If the 
INH^ARB* line is not asserted when a device asserts its BUSHEQ tine, the request and 
arbitration phases occur in the same cycle. If higher priority devices are asserting their 
BUSREQ lines* the arbitration phase may last for several cycles. Once a device has as- 
serted its BUSREQ line, the INH^ARB* line is not asserted, and there are no higher prior- 
ity BUSHBQs assened, the device owns the X-Bus in the next bus cycle. 
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CLK 



























8USHE0 



Figure 2-;5. WRITE Tlmng 

When the device gains access to the bus* the transfer phase begins. For a WRITE, the 
transfer phase lasts only one cycle* During this cycle the master device: 

• Deasserts its BUSREQ signal 

• Does not assen the INH_ARB' signal 

• Asserts the WRITE command on the CMD* signals 

• Asserts its own device ID on the ID* signals 

• Asserts a unique ID on SUBID*. This field is opdonai and coiild be used to help 
steer error information returned from the slave device to the proper area within 
the master device. 

e Asserts the address to be written on ADR 

• Asserts the virtual page number of the address to be written on VPN 

• Identifies the bytes to be wriaen by asserting the valid bits within VALDBYT 

• Asserts the data to be written on DAT* (31:0) 

In the cycle after the transfer phase, the slave does some preliminary checking on the 
transmission (Is the command valid? Is it addressed to this device? Is the parity good? Is 
the device able to accept such a command at this time? etc.)- At this time, the master 
deasserts all of th signals that it asserted in the transfer phase. In the next cycle, the mas- 



INH ARB* 



ADR* /DAT' /VALDBYT" 
CMDVID*/SUBID* 



ACK' 
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ler and slave devices enter ihe acknowledge phase where the slave sends the master some 
preiixn2nar>' daxa retardm^ the" status of the transier. Since the slave may be busy procesh- 
^. - uig the transfer request for several bus cycles, ii needs to nave some buffennc. or a war to 
reject a transfer request when busy, or both. 

To insure that write order is preserved, it is iilecal for a device to attempt writes in two 
consecutive cycles. If a wnie were attempted in the cycle following a u-nte, there would not 
be any way to prevent it from executing before the previous write if the previous write re- 
ceived a negative acknowledgement. second write may be issued in the same cycle as the 
acknowledgement from the first write is received if the REJ* signal is used to cancel the 
second write (if the first wnte u'as not accepted). 

2.6.2 UTUTE MULT (Multiple 32-Bit Write) 

Figure 2-16 shows the timing of the WRITE MULT command. 



CLK 



BUSREQ 



INH ARB- 



ADRVIDVSUBJO' 



CMD' 



DAT- 



ACK- 



Figure 2-/6, WRITE MULT Command Timing 

A WRITE MULT instruction Involves the transfer of one or more 32-bit words. The urans- 
fers must adhere to 32-bit botmdary alignment at the start of the transf r and they must 
adhere to 64-bit boundary alignment thereafter. ^TIITE MULT differs from WRITE in 
that the VALDBYT signals are ignored in WRITE MULT (writes of partial 32-bit words 
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are not allowed). If the transfer is a WRITE MULT, the sequence of X-Bus events is dif- 
ferent than-ifir a WRITE transfer. The request and arbitrauon pnases proceed m the same 
fashion as the WRITE sequence, butjhe transfer and acknowledgement phases proceea 
■ differently. During the first transfer cycle the master devtce: 

• Deasserts us BUSREQ signal 

• Asserts the INH_ARB* line 

• Asserts the WRITE MULT command on the CMD* signals 

• Asserts its own device ID on the ID* signals 

• Asserts a unique ID on SUBID*. This field is opuonal but could help steer error 
informauon from the slave device to the proper area within the master device. 

• Assens the staning address to be written on ADR* 

• Asserts the virtual page number on VPN* 

• Does not use the VALDBYT* lines 

• Asserts the first 32 bits of te transfer on DAT* (3 1:0) if the transfer does not scan 
on a 64-bit boundary' 

During the next transfer cycle the master device: 

• Asserts the WRITE DATA command on the CMD* signals 

• Asserts the data to be transferred on DAT* (63:0) 

9 Leaves the other bus signals as they were set in the previous bus cycle, unless this 
cycle contains the last data transfer. In this case, the INH^ARB* line is deas- 
serced. 

The slave device: 

• Performs some preliminary checks on the transacdon of the previous cycle 

• Sends an acknowledgment to the master during the next cycle, based on these 
preliminary checks 

This cycle is repeated tmtil sufficient data has been transferred. The slave device responds 
to each transfer cycle with an acknowledgement two cycles after the transfer cycle. 

2-6.3 Error Recovery During Writes 

Writing data that is less than a longword is not possible with the WRITE MULT command. 
Data is stored and checked in memory as 32*bit quanudes. When writing a poruon of a 
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- 32-bii word, the memory controller must first read, checkVand correct the' word ihai is 
curreniiv ai that locauon. Then it merces the new- data with- the -old data, computes new 
check *l7irs; and writes the new-data and check bits, if the check poruon of this operation 
detects an uncorrectable data error, conunumc of the operation could destroy aata. In thi^ 
case, the write is inhibited. 

2.6.4 Features 

There are some restrictions on using the WRITE MULT command concermng the starting 
address of the block to be written. The memory controller has more than one bank. The 
WRITE MULT command cannot cross a bank bowdary. If it does cross this boundary*, 
the memory controller may reject only part of the transfer because of a busy bank condi* 
tion. This catises the entire transfer to be retried. The retry operation then finds that the 
other bank is busy, causing further retries. Since memory is managed on a virtual page ba* 
SIS* and >inuaJ pages don't cross bank boundaries* this restriction has minmial impact. 



2.7 Read Sequences 

2.7.1 READ and READ MULT 

The READ and READ MULT commands are very similar. The READ command involves 
a single 32-bit transfer, uses the VALDBYT* signals* and attaches significance to 
ADR(2)*. READ MULT deals with transfers involving multiples of 32-bits. and does not 
use the VALDBYT* signals. The remainder of this section focuses on the READ MULT 
command. 

The read sequence on the X-Bus is more complex than the write sequence. It is actually 
broken down into two distinct sequences: a read command sequence, and a read response 
sequence. The read command sequence resembles the write sequence, except that the 
DAT* and VPN* fields are not used. If the command is a READ MULT* a code specify- 
ing the number of longwords to be transferred is placed in DAT* (7:0). After completing 
the read command sequence, the master device gives up the bus to any requesting device. 
Then the target device fetches the requested data. When the data is available, the device 
that was the target of the read command initiates the read response sequence. 
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BUSREQ" 



INH^ARB* 

ID-/SUBID- 

CMO- 

DAT- 
ACK* 



Xread Wreati V 
response/ ( \response/ ^ 



Xack j/ack 
data j \ data 



Figure 2^17. READ RESPONSE Command Timing 



The read response sequence, initiated by the target of the read command sequence, pro- 
ceeds as follows: / 



• The device executes the request and arbitration bus phases. It then enters the 
transfer phase. 

• Deassens its BUSREQ signal. The device may leave its BUSREQ asserted until the 
beginning of the last transfer cycle. 

• Allows the INH_ARB* line to stay deassened (i.e., does not assert the 
INH^ARB* signal) if the vansfer is a sin^e transfer. If this is a multiple transfer, 
the INH_ARB* line must be asserted until the beginning of the last transfer. 

• Asserts the READ RESPONSE command on the CMD* signals. 

• • Asserts the device ID of the device that initiated the read command sequence on 

the ID* signals. 

• Asserts the unique ID that was sent on the SUBID* field during the read com* 
mand sequence back onto the SUBID* signals. 

^ Asserts the read data on DAT* (63:0). 
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At' this point, the first cycte of the transfer phase in the read response sequence is com- 
plete. The. transfer phase may continue if i4ie_read requested multiple transfers. In eith^ 
case, both devices enter an acKnowiedce phase and may, m fact, be simultaneously m an 
acknowledge and transfer pnase. Tnere must be an acknowledge cycle for each transfer 
cycle. The transfer and acknowledge phases continue for as many cycles as nece5sar\' to 
deliver the requested amount of data. 

2.7.2 Error Recoven* During Reads 

When the device being read is a memory controller, the device anticipates the availability 
of the data and requests access to the X-Bus before the data is accuaiiy ready. This mini* 
mizes the access time for the requesting processor. In some situations, the data is available 
at the beginning of the cycle in. which the data is to be transferred over the bus. This pre- 
sents some problems concerning what is to be done about correctable and imcorrectable 
errors on the data that is being sent. This Informauon is available at the end of the cycle 
in which the data is transferred. This is too late to stop the transfer, but not too late to 
assert the R£J* line in the next cycle. REJ' teils the destination device to disregard the 
data it has just received. This catjses the destination device to ignore the last cransacdon 
and go back to waiting for the read response (equivalent to executing a NOP command). If 
the transfer is a multiple quad word transfer (i.e.. the response from a READ MULT com* 
mand). only the transfer that was sent in the cycle prior to activating REJ* should be dis- 
carded. If the memory controller needs to cancel two successive transfers, it must assert 
the REJ* line for two consecuuve cycles. 

If a correctable error is detected while the memory controller is still asserting the 
INH_ARB* line, the pipelines in the memory controller are stalled. The corrected data is 
also transmitted across the X-6us, and then the pipelines are unscailed. If the error is not 
detected tmtii the INH_ARB* line has already been deassened, it is too late for the mem- 
ory' controller to hold onto the bus. so it must rearbxtrate. Once the controller acquires the 
bus again, it re-^ransfers the data, starting with the data that was corrected. 

If an uncorrectable error was detected, the memory controller returns the dau that was 
read with a READ RESPONSE ERROR command and continues processing data in the 
normal manner. The error address is saved in registers accessible via the X-Bus and the 
scan interface. 

2.7.3 Features 

Read returns from the memory system have a high priority to minimize read latency time 
and to keep the memory queues as empty as possible. Since, in many cases, the processor 
is stalled until the read data it requested is returned, the read process must be as efficient 
as possible. 

The READ MULT command is used for I/O block transfer and cache fill operations. For 
cache fiU operations, the cache might request 4 quad words of data aligned on a quad 
word boundary. In addidon, the each might want the second quad word in that group to 
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be delivered firsi in order to streamline the cache*io-lP interface. The memort* controllers 
suppon this function by allowmg the requesunc device to specify the starting quad word 
address and the amount ol data to be transferred. In addition to specifying the amotint of 
■dau to be transferredv the parameter for the number of qtiad words to be transferred is 
aiso used to determme the surung address of the blocit to be transferred (don't confuse 
this with the starting address of the quad word to be transferred). For instance, if the 
transfer size is two, the two quad words with ADR(3)* = 0 and 1 surting at the address 
specified in ADR* are transferred. If the transfer size is four, the four quad words with 
ADR(3)* and ADR(4)* s {00, 01, 10 and 11} starting at the address specified in ADR* 
are transferred. Figure 2-18 shows the complete X-Bus timing cycle for the READ, 
WRITE, and READ RESPONSE commands. 



CLK 



INH ARB* 



BUSREQ1 



BUSREQ2 



BUSREQm 



ADR-. 
VALDBYT" 

IDVSUBIO' 



OAT* 
CMD' 

ACK* 



rmrLTUi 



LTLrmnji- 



data 



V'..-:.:. 



Y rdm I . Y wrm Y write 1 writa \{ ^^^'"^^ ''••^ If 
A 1 A A 2 A data A data Aft re«p A resp A / 

' *'Tr'-' Y adc f ack V^nacicrY ' ack T'liek- Y ac* 



READ MULT, from Device 1 to memory READ RESPONSE, from memor/ to Device 1 
WRITE MULT, from Device 2 to memory 



Figure 2-18. Complete X^Bus Timing 
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be delivered first in order to sireamiine tiie cache-to-IP interface. The memor\* coniroiiers 
suppon this function by alio wine the requesung device to specify the staning quad word 
address and the amount of data to be transferred. In addition to specifying the amount of 
.data 10 be transferred; the parameter for the number of quad words to be transferred is 
also used to determme the stanmc address of the block to be transferred (don't confuse 
ihis with the stamng address of the quad word to be transferred). For instance, if the 
transfer size is two, the two quad words wich ADR(3) * s 0 and 1 scanzng ai the address 
specified in aDR*^ are transferred. If the transfer size is four, the four qtiad words with 
ADR(3)* and ADR(4)' = {00. 01. 10 and 11} starting at the address specified in ADR* 
are transferred. Figure 2-18 shows the complete X-Bus timing cycle for the READ, 
WRITE* and READ RESPONSE commands. 



CLK 



INH ARB* 



BUSRE01 



BUSnE02 



BUSREOm 



ADR*. 
VALDBYT- 

IO*/SUBID- 



DAT* 
CMD* 

AW 



reql I reql 











data I 

m A 




wrm 

\ 2, 


A data/\ 


wrtte < 
i data j 




\r*80 A 














111 


LjU 




aek \ ' \ 
-02 



aek 



READ MULT, from Device 1 to memory READ RESPONSE, from memor/ to Device 1 
WRITE MULT, from Device 2 to memory 



Figure I^-IS, Complete X^Bus Timing 
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characteristics of these instnicuons that disunguxsh them from other X-Bus commands and 
the normal interrupt process. 

First, these commands are broadcast commands, which means thai they can be sent trom 
one processor to every other processor in the system during a single X-Bus cycle. This 
charaaenstic means that the transacuon does not receive an acknowledgement, smce the 
acltnowledgement from ail the target processors would overlap. 

Processing these commands is timet-cridcal. The processor limits the amount of bus trans- 
actions that are executed using an old translation mapping. The processor must also be 
ready to accept a new INVALIDATE command immediately because there is no mecha- 
nism for rejecting the command, and it is important that these commands do not get lost. 
If a device receives too many SELECTIVE INVALIDATE signals to process at once, it 
goes into a caich^up mode, where it invalidates the whole TB (It is assumed that invalidat- 
ing the whole TB can be done faster than the selective invalidate. The device may also 
raise an invalidate overrun condition). 

Because the data that is affected by the INVALIDATE commands is shared among the 
processors in the system, the INVALIDATE commands are issued under a bus lock. This 
is done so that the invalidation of the TB doesn't interrupt any in-progress interlocked op* 
eration, leaving data in a half-modiOed state. 

Even if the INVALIDATE commands took zero time to execute, there may be further ac- 
cesses to pages that were previously mapped and unmapped because of the INVALIDATE 
command. The processors write buffers still have transactions pending which were based on 
the previotis mapping. Therefore, before an tmmapped page is reused or written to disk, 
the process managing the page must insure that all pending transactions at the time of the 
invalidate have traversed the X-Bus interface. To accomplish this, the process issues an 
interrupt to each relevant processor and waits for an interrupt acknowledge. The imemipt 
acknowledge insures that all buffers are flushed before the interrupt acknowledge reaches^ 
the X-Bus. 

2.8.2 I/O Interrupts 

If an interrupt is a device completion interrupt, the interrupted processor can't act on the 
interrupt until the data transfer is complete* This is necessary because of the btiffering pre- 
sent between the I/O bus interface and the X*Bus. If the channel for the interrupt is inde- 
pendent of the data buffering, the imerrupc could be processed before the daa transfer is 
complete. Since the imenupt is actually an X-Bus write operation* the interrupt follows the 
prior data transfers onto the X-Bus. If the interrupt path was separate from the data path, 
the data could be synchronized at the time of the interrupt acknowledge cycle by requiring 
that the write buffers be emptied prior to returning data in response to the interrupt ac- 
knowledge. 

The interrupt acknowledge cycle generally reads either a status register or, in the case of 
an auxiliary bus, interrogates the bus to find out which specific device caused the interrupt. 
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The imemipanc device must know thai the interrupt has not been serviced. Ii only clears 
this tmemipi pending status upon an miertupt acknowledge response trom the interrupted 
device. This interrupt acknowledge typicaUy takes me form ol a read lo' a specific address 
in the interrupter. The interrupter contains a X-Bus accessible register which is loaded hv 
a processor pnor to any I/O acuviiy which specifies me address lo be used by the mter- 
rupter for addressing interrupt v^Tites. If it must direct different i\'pes of interrupts to either 
different interrupt flags or different devices on the X-Bus, the intermpter has muiuple in- 
terrupt address registers. 



2.9 Error Recovery 

The primary goal of the error detection and recovery is to insure that user data is not cor- 
rupted as a result of abnormal conditions. The secondary goals are to maintain a high level 
of availability and diagnosiblity in the system. This is especially important in a muitiproces- 
sor system where quite a bit of system resources could be made unavailable if a processor 
or memory bit maifimcnons. A summary of the types of possible system related errors and 
the system response to those errors is shown in Table 2-5. 

Tabie 2^S. System Error and Response Summary 



Type of Error 


Action 


Bus aquisttion timeouu 
Read response timeout; 
Error acknowledgement parity 
error on data 

No response acknowiedgemem: 
Missing device; 
Parity error on address; 
Lock- timeout 

Powerfail 

Read Response Error ECCU on 

read operation Bus error on 

VMEbusoperation; 

Mo IRQ asserted at lACK time 

during VMEbusimzp 

Bus error on VMEbuswriie 

Abnormal Condition Detected 

Sequential mode selected but 
not ail UVALID* bits are asserted 

Parity error on data from lO map 


Freeze docks; invoke Serv. Processor 

Execute check (read); opdonaily invoke SP; 
Fetch check (write) 

Interrupt: gracefully terminate disk operation 
Execute check 

VMEbus interface posts intezrupt 
Gate Array/ Utility Board Acdon 

UBERR* asserted and cyde aborted, status bit is set in 
register of U-btis master and a CPU interrupt is generated. 
MP_PERR is asserted for the duration of dme that the 
parity error exists,' this-signai- should be-examined at the . 
time that a U*bus timeout occurrs to determine if a lO 
map parity rror is present. This will cause an interrupt 
to the CPU. 
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Table 1-5. System Error and Response Summary <Cont,i 



I Type of Error 



Action 



No response acknowledcemem: 
on X-bus read 

on X-bus write, compact off 



on X-bus write (compact on) 
or response to a requested read 

Read Response Error on: 
Requested read data 

Prefetched read data 



An X-bus write attempted with 
the Protect bit set in the I/O Map 
Time Out (-3msec) while waiung 
to acquire X-bus or waiting for 
a read response firom a read or 
read multiple command. 

Data parity error on a read or 
write command from X-bus 

Error acknowledgement 

Read Response received from the 

X-bus without a pending read 

operauon 

An unsupported command is 

received 



UBERR' asserted and cycle aborted, status bit 

is set m register of U-bus master and a CPU interrupt is 

generated. 

UBERR* asserted and cycle aborted, status bit is set in 
register of U-bus master and a CPU interrupt is 
generated. 

NRSF (one cycle) assened. 

This causes a status bit to be set and a CPU interrupt 
generated. 

UBERR* asserted and cycle aborted, status bit is set in 
register of U-bus master and a CPU interrupt is 
generated. 

Data is not loaded into the read buffer. If that data is 
subsequently requested, it will again be fetched from 
memory and then caiises a UBERR* if it is still bad. 

UBERR* asserted and cycle aborted, status bit is set in the 
register of U-bus master and a CPU interrupt is generated. 

Clock Stop line is asserted 



Clock Stop line is asserted 
Clock Stop line is asserted 

Command is ignored and no acknowledgment is sent. 
Command is ignored and no acknowledgment is sent. 



The quidelines for error recovery are as follows: 

• • Errors that could be created by software are, in general, recoverable and, there- 

fore, passed to the operating system for analysis and attempts at recovery. 

• Errors that are caused by hardware failures bring the system to a halt as soon as 
possible for analysis by the SP. 

• If it is unlikely that the software can recover from an error, the machine stops 
before the real cause of the error is obscurred by further processing or before a 
fatal fault inadvertently causes loss of data. 
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2.9-1 X-Bus errors 

Tlic following subsections descnoe me various iype«'"or X-Bur errors 

2.9.1.1 Timeouts 

The system has umeoui mechanisms that detect errant hardware and software. These 
timeout mechanisms notify the user that an abnormal condition has occurred. Timeouts are 
serious and are reported lo the Serxnce Processor for ioggmg and/or funher diagnostic ac- 
tion. There is a timeout mechanism in each X*-Bus de\ace which notifies that deWce if it is 
unable to gain access to the bus or if it has not received a response from a read operation. 

The timeout period is greater than 3 milliseconds. This is long enough that only a serious 
system failure couid cause the umeout. The timeout sets a timeout flag in the device's in- 
terface status register and then causes the state of the X-Bus interface to be frozen until 
the condition is cleared via the scan loop mechanism. BUS RESET*, or RESET*. The 
Service processor is made aware of this situation so that the system doesn't smiply lock up. 
The Service Processor polls the interface status register on each device to find the source 
of the timeout. 



2.9.1.2 Lock Timeouts 

Lock timeout is handled via a timer located on each X*Bus device that can generate a bus 
lock. The timer times the duration of a bus lock. If a lock has been held for more than 
200 microseconds, a lock timeout is generated. This lock timeout releases the bus lock and 
generates a trap to the processor to indicate that the action has taken place. The lock 
timeout period is less than the bus access timeout period so that other devices on the X- 
Bus, which are trying to acquire the bus lock, don't trip their btis access timeout mecha- 
nisms as a result of some other device holding lock too long. This means that a processor 
may generate lock timeout errors that are really bus access timeout errors, but that doesn't 
cause any serious problems because the recovery software is aware of this possibility. 

2.9.1.3 Parity Errors 

The parity error indicates that a serious system malfunction has occurred. This causes the 
system to stop before additional damage to data or system state occtirs. If an X-Bus device 
detects a parity error on data that was sent to it, it sets a parity error flag in its interface 
status register. It also responds with an error acknowledgement, anc} then stops processing 
transactions either to or from the X-Bus. The error acknowledgement causes similar ac* 
tions in the sending device. The interface status registers can be read via the diagnostic bus 
scan loops. Neither interface participates in further X*Bus transactions until the condition 
is cleared via the scan loop mechanism, or RESET* . The Service Processor is made a^i'are 
of the error condition via the Diagnostic Bus HALT signal. 
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2.9.1.4 No Response Acknowiedgemenc 

If a device receives a no response acknowledcemeniMi.e.* non-e»sieni device ) from a 
iransacuon, ii may atiempi lo recover, but the faikire-is likely lo be caused by a fatal prob- 
lem. The no response may be ihe result of an address paniy error, irymg lo access a de-- 
vice ihai does not exist, or undefined, unsupponed commands on the X-Bus. If the re- 
quesimg device xs a processor, it traps to an error recover)' procedure that tries to discover 
if the problem xs related to hardware or software. If the problem is hardware related, the 
processor lets the Service Processor resolve the problem. If the requesting device is not a 
processor, it sets a no response flag in its status register and waits for the Service Processor 
to let it proceed. 

If a device is selected, such as a memory controller, and the address that is presented is 
not a valid address within that controller, the device responds with a no response ack- 
nowledgement that indicates the address is not valid. This is preferable to the error ack- 
nowledgement because there is a reasonable chance that this type of error is a software 
error, not a hardware error, and possibly recoverable. Error acknowledgements are treated 
as fatal and are used to indicate nonrecoverable hardware type error condiuons. 

2.9.2 Memory Errors 



2.9.2.1 ECCU on Wnte 

The only situation that generates an ECCU on a memory write is when an attempt is made 
to write a portion of a 32-bit word and the existing 32-bit word has two or more bits in 
error. The write operation is lenninated so that data is not destroyed. An interrupt 
(WRITE command) is sent to a prespecified X-Bus address. When the processor responds 
to this interrupt, it can read the address of the data which caused the ECCU. It can also 
read the ID* and SUBID* of the device that generated the failing write operaton. 



2.9.2.2 ECCU on Read 

If an ECCU error is detected on a read operation, the read operation is treated as a nor- 
mal read operation except that a READ RESPONSE ERROR command is returned with 
the data instead of a READ RESPONSE command. The data that was obtained on the 
read is returned uncorrected as the data pordon of the transfer. If the error was not de- 
tected until the bad data was sent out. the REJ* signal is asserted to cancel that transfer. 
The data is resent using the READ RESPONSE ERROR command code. 



2.9.2.3 ECC on Read 

If a correctable error is detected on a read operation before the data is transferred, the 
data is correaed and then sent to the requesting device. If the error is not detected until 
after the data is sent onto the X-Bus, the REJ* line is asserted in the cycle after the data 
is sent to cancel the transaction. Then the data is resent. In either ease, a record of the 
error is kept in a register that can be accessed by either the X-Bus interface or the diag- 
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nosuc processor via ihe scan loops. The informauon siays iacched in this register unui the 
daia IS read oui. This may cause infonnauon about later errors lo be discarded. 

•9.3 ^'^^Ebus Errors 

Most errors that occur on the VMEbus are signaled by asserting the VMEbus signal. BUS 
ERROR. In most cases, this signal is asserted by the slave in the VMEbus transaction. It 
may also be assened by the btis timer which is located on the system controller module. 
The bus timer function is implemented on the VMEbus mterface and sets a bus umeout 
flag whenever the VMEbusAS" (Address Strobe) is assened for more than 100 microsec- 
onds. Depending on an enable bit, setting this flag may also cause an interrupt to the de- 
vice specified in the VMEbus interface's interrupt address register. There is also a bus er- 
ror flag in the VMEbus interface which is set when any BUS ERROR occurs on the 
VMEbus. This flag also has an enable bit which allows generating an intemipt to the de- 
vice specified in the interrupt address register. These flags are part of the VMEbus inter- 
face's status register and are reset whenever the register is read. Whenever a BUS ERROR 
occurs, the address that was on the VMEbtis is recorded in a bus error status register for 
interrogation by an X-Bus device. This register locks up once an error has occurred and 
does not record other error addresses imtil it has been read. 

2.9.3.1 Bus Error on Read 

If an X-Bus device initiates a read operation on the VMEbus that resiilts in a BUS ER- 
ROR, the VMEbus interface does not respond to the read command with a READ RE- 
SPONSE. Instead, it responds to the requesting device with a READ RESPONSE ERROR 
command. The lower 32 bits of the returned data, DAT* (3 1:0) reflect ^^'hatever data was 
on the VMEbus data lines at the ume BUS ERROR was asserted. To determine if the BUS 
ERROR was the result of trying to access a nonexistent device, the requesting device must 
look at the bus timeout flag in the VMEbus interface's status register. 



2.9.3.2 Bus Error on Write 

If an X^Bus device initiates a write operation on the VMEbus which results in a BUS ER- 
ROR, the VMEbus interface sets a write bus error flag in its status register. It also sends an 
interrupt to the X-Bus device specified in its interrupt address register. If the BUS ERROR 
was a result of trying to access a nonexistant device, the bus timeout flag is also set. 



2.9.3.3 Bus Error on Transfer not Initiated by an X-Bus Device 

If a BUS ERROR occurs while a VMEbus device is active on the bus. it notifies the 
VMEbus interface of the condidon via the normal interrupt mechanisms. In some cases, 
such as an intelligent disk controller, the device may try to recover from the error without 
intervention from the responsible X-Bus d vice. In this case, the X*Bus devic would not 
be awaie of the error unl ss th bus error interrupt enable in the VMEbus interface were 
set. 
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2,9.3.4 No IRQ Asserted During lACK" Cycle 

If the VMEbus imertace has sent an mierrupi lo an X-Bus device and ihere no IRQ 
line from the VMEbus asserted when the X-Bus device responos with an lACK' cycle, the 
VMEbus interface does not perform a VMEbus lACK' cycle. Ii also returns tne status reg- 
ister Mth a READ RESPONSE command and an mierrupt dropped flag set m tne status 
register. The processor treats this condition as a spunous interrupt. 



2.10 Physical Address Space 

The X-Bus physical address space is 30 bits wide, or 1 gigabyie of physical memory and 
device space. Table 2*6 shoe's how it is panxdoned. 

Tabie 2-6. X-Bus Fhysicai Address Space 



Address 


Device 


00,000.000-00,3FF.FFF 
00,400,000-07,FFF.FFF 
08.000.000-OF.FFF,FFF 
10.000.000-17,FFF.FFF 
18,0OO,000-lF,FFF,FFF 
20,000.000-2F,FFF,FFF 
30,000.000-3F,FFF.FFF 


4 NfB, Processor Registers 
124 MB, Reserved 
128 MB, Service Processor 
128 MB, Reserved 
128 MB, Reserved 
256 MB. Memory No. 1 
256 MB, Memory No, 2 
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Chapter 3 

U-Bus Interface 



This chapter describes ihe U-Bus and the U-Bus interface lo the X-Bus. 



3.1 Utility Bus (U-Bus) 

The Utility board contains several funcdonai subsystems that are essential to the Series 
10000 processing system. It contains the VMEbus and PC AT compatible bus interfaces* 
power supply interface, conuoi panel interface, the Serial Input/Output (SIO) line inter- 
faces, limera, calendar, the Service Processor (SP). and the Diagnostic Bus (D-Bus) inters 
face. The. system's clock generation drciutry is also located on the Utility' board. 

The VMEbus and PC AT compatible bus interfaces, the SIO line mierfaces, timers, and 
calendar are connected to the core system's internal bus (X-Bus) interface via the Utility 
board's imemal 32-bit btis (U-Bus). The Service Processor (SP) and its associated mem- 
ory are connected to the U-Bus. but are independent of it, allowing the SP to access the 
D-Bus interface without interferring with the U-Bus operations. The I/O map and the 
VMEbus address modifier tables are also accessed through the U-Bus. 

The Utility board's internal bus, the U-Bus, streamlines the X-Bus to \^Ebus interface, 
since most of the high data rate transfers occur between these two functional units. This 
architecture also simplifies the SP to U-Bus interface. 

There are five interfaces that arbitrate for tise of the U-Bus. These include the X-Bus in- 
terface, the SIO interface, the SP interface, the VMEbus interface, and the PC AT com- 
patible bus interface. During normal system operation, the SP does not need access to the 
U-Bus. However, the SP memory is loaded via the U-Bus. The SP must also have access, 
t the devices on th U-Bus for diagnostic purposes. 
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APPENDIX III 

Chapter 12 

CPU to X-Bus Interface 



12.1 Overview 

The system's CPU X-Bus Interface (BIF) connects the processor's instruction and data 
caches to the system backplane bus. The principal functions of the BIF unit are: 

• Suppon the X^Bus reads necessary to fill the instruction and data caches. 

• Queue and deliver processor stores to the X-Bus, isolating the CPU from X-Bus 
write latencies. 

• Act as a btis watcher and ensure cache coherency in the face of extertial stores. 

• Act as a clearing house for system communications, such as interrtxpts, to and 
from the CPU. 

Maintain and check CPU cache data parity:- 

Also, the BIF provides much of the support logic for the self-test of the CPU cache 
RAMS. 



12.2 BIF Block Diagram 

The BIF is composed of 3 gate arrays: The bus interface logic also includes the inscruction 
and data cache duplicate tag stores, the X-Bus interface transceivers, and some supporting 
instate drivers. 

The address gate- array- (CB A) handles outgoing- and inbound-address transfersr Outgoing 
address transfers occur for insiruction and data cache read issues, and for data cache write 
issues. Inbound address transfers are required for cache entry invalidation caused by exter- 
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nal writes, and for cache miss filling. The CBA caie arrav also maintains ine aupiicaie lac 
stores and handles all bus waichmc. Finallv. the CBA cate arrav accepts and forwards m- 
lerrupi requests to the processor.- 

The two data gate arrays (CBDs) are identical. One transfers even bnes. and the other 
transfers odd dau bnes. The CBD gate arrays queue and for^vard uTite data, and return 
read data. The CBD gate arrays check and maintam the cache parity. Figure 12-1 shows 
the processor block diagram and illustrates this parution. 
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12.3 Bus. Interconnect 

The BIF accepts and returns processor addresses from the PA, EASRC. PCSRC. and \'PN 
bus registers. The BIF also accepts ana returns data from the processor INST and DATA 
bus registers. It uses the X-Bus as its path to main memor\-. 

For a data cache read miss, the MMU gives the physical address to the BIF over the PA 
bus. The accompanying VPN is captured by the BIF direcUy from the EAVPN bus. N^Ticn 
the cache fill begins, the BIF suppUes the cache index to the EASRC bus over the PA bus. 
The memory data is supplied directly to the cache DATA bus. 

For an instruction cache read miss, the MMU provides the physical address to the BIF 
over the PA bus- The BIF captures the accompanying VPN is captured directly from the 
PCVPN bus. W^en the cache fill begins, the BIF supplies the cache index is supplied to 
the PCSRC bus over the PA bus. The memory data is supplied direcUy to the cache INST 
bus. 

For a data cache wnte, MMU provides the physical address to the BIF over the PA bus. 
The BIF captures the accompanying VPN directly from the EAVPN bus. In this case, the 
store data has previously been captured by the BIF directly from the DATA btis. When an 
external write requires purging a local cache entry, the BIF supplies the invalidate address 
to the MMU over the PA bus. 



12.4 X-Bus Arbitration 

All X*3us interfaces except the default owner must request the bus prior to use. There is 
one bus request level on the backplane for each X-Bus device. Devices are grouped tmo 
two classes. Qass A devices are awarded the bus in strict priority order. Class B devices 
participate in fair arbitration and may also be defouh bus owners. CPUs are class B de* 
vices. 

Bus arbitration is decentralized. Every bus interface decides for itself whether it has gained 
access to the X-Bus. Bus arbitration can be inhibited by asserting the ARB_INHIBIT back* 
plane signal. Only the current owner of die bus may assert this signal. The current owner 
does so if the intended bus transfer requires mtdtipie cycles. 

12.4.1 Class A Request Override 

To request the bus. a Class A device asserts both its assigned request level and the bus 
request stun line on the btis. When the BIF detects the bus request sum assertion in an 
active bus arbitration cycle, it defers to the class A device(s). 
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12.4.2 Class B/CPU Requesting 

The CIas5 B deuces, the four CPUs aiso nave fixed prioriiy assignmems. Pnoniy assicn- 
menis are 0 throuch 5, uiih 5 being me highesi pnoniy. The assignment is scanned into 
ihe BIF and used lo determine which of the lour Class B request parallel backplane signals 
each CPU uses. The CPU drives its assigned level, and defers to requestors at higher lev* 
els. 

Class B devices exercise fair arbiirauon. and don't reassert their request lines on demand. 
Instead. Class B devices snapshot all other lower priority Class B request lines dunng the 
final cycle of a bus ownership. The Class B device then relinquishes the bus and doesn't 
reassert a request line until all the snapsnoued requests are sausfied. The class B devices 
observe the current state of the other request lines to determine that the other requestors 
have been serviced. When a request line is deasserted. service is underway or completed. 
If a request line is still assened. but arbitrauon is enabled, that requestor wins and service 
resumes. 

12.4.3 Default OM-nership 

When the bus is otherwise idle, the last successful bidder among the Class B requestors 
remains as the default bus owner. The default bus owner may use the bus at the end of 
any cycle during which no other request line was assened. The default bus owner does not 
have to assert its request line. The default remains in effea until another Class B device 
wins the bus. 

A Class B device's bus ownership may be suspended by a Class A device. If a Class A de- 
vice assumes control of the bus, the former Class B owner device waits for the bus to be- 
come idle again before reclaiming bus ownership (i.e.. the Class B device reassumes owner- 
ship in the cycle following one during which arbitration was permitted, but does not assen 
its request line). If another Class B device wins the bus before it becomes tdie, default bus 
ownership transfers to the latest Cass B bus owner. 

12.4*4 Acquisition Timeout 

When a BIF first asserts a bus request line, it starts a timer. If the timer elapses before the 
bu5 is acquired, a bus acquisition timeout occurs. The bus timeout duration is approxt- 
maiely 3.2 milliseconds (16-bii counter). If a timeout occurs, the system is assumed bro- 
ken and a clock freeze request is made of the SCR. The internal BIF state is preserved as 
much as possible. 

The timer is not stopped until either a NOACK or ACK signal is received for the request 
address transfer. The timer, therefore, expires if a device is contimially btisy. Broadcast 
transfers, such as TB invalidates, stop the timer regardless of the acknowledge line state. 
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12.4.5 Local Request Prioritization 

Three compeiinc locai requestors are interna I to the BIF. They include daia cache read* 
data cache vvme. and tnsirucuon cacne read. Data cache read is pnoniized over instniction 
cache read. In turn, instruction cache read is pnoruued over dau cache u-nte. The follow- 
ing list contains excepuons to these rules: 



• If the uTite data queue is full, data cache MTite is pnonuzed over an instruction 
cache miss. 

• If a data cache miss collides in address with a previously queued urite, data cache 
write is given priority over both dau and instruction cache miss. 

• If a wnie to an unencacheable memory iocauon is queued* data cache wnte is 
given pnority over both data and mstrucuon cache misses. 

• If a wnte and unlock is queued, dau cache wnte is given pnonty over both dau 
and instruction cache misses. 

• If a dau cache miss from an unencacheable memory location is posted, dau 
cache wnte is given pnonty over both dau and instruction cache misses. 

• If a dau cache miss and lock is posted, data cache write is given pnority over 
both dau and insinxcuon cache reads. 

• If a data cache miss and unlock is posted, dau cache write is given priority over 
both dau and instiuaion cache reads. 

• If a tb invalidate is queued in the write buffer, dau cache wriit is given priority 
over both instruction and dau cache misses. 

A locally generated READ RESPONSE reqtiired for a BIF CSR read is given precedence 
over all other transmitters. 



12.4.6 Subsequent Request Arbitration Delay 

The BIF issues subsequent requests from the dau cache every other bus cycle (or later). 
This assures write order between processors, and read-write order within one processor. 
The instruction cache miss request is not restricted to every other cycle. For load and lock, 
load and unlock, and store and unlock, subsequent requests are not issued undl a success* 
ful bus acknowledge of the prior request is received. 

The BIF issues subsequent requests from a CPU every other bus cycle (or later). This as- 
sures write order. For load and lock, load and unlock, and store and unlock, subsequent 
requests are not issued until a successful bus acknowledge of the prior request is received* 



12-6 CPU to X-'Bus Interface 



- 142 - 



EP 0 366 434 A2 



Apollo Preliminary and Confidential 



12.5 X-Bus Reads 

X-Bus reads are spin inio two parts: address transfer and data return. The BIF aroiiraies 
for an address transfer to initiate a oata or instruction cache miss. The bus tntertace then 
awaits data return. The BIF arbitrates for data return only when responding as a slave to a 
CSR read. 

12.5.1 Read Initiating 

V.'hen the BIF wins the bus and decides that a read is the highest priority task, it transfers 
the read address and issues euher a READ or a READ MULTIPLE command. It issues a 
READ command if the CPU request is less than or equal to 32 bits, and was either unen- 
cacheable would change the bus lock status. The BIF issues a READ MULTIPLE com- 
mand otherwise. 

If the request ts a READ, the byte mask accompanying the address decides the exact re- 
quest size. 

If the request is a READ MULTIPLE, additional request information is provided in the 
address and data fields. The WD field is always 00. The following setungs are used for the 
LL field: 

00 64-bit Read 

01 Data Cache Normal Fill 

10 InscruGUon Cache Fill 

11 Extended Data Cache PHI 

The LONGWORO COUNT field is always equal to 0000 0010. 



X-Bus Read MulUple 



63 


62 


61 


34 33 


32 
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Physical Address 
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31 




08 


07 




00 








Longword Count 



00 Transfer Length « 2 Longwerds 00 Use Longword Count. Modulo Wrap 

01 Transfer Length « 4 Longwords 01 Length Speetfted by LL. Modulo Wrap 

10 Transfer Length « a Longwords 10 Use Longword Count 

11 Transfer Length s is Longwords 11 Length Specified by LL 



Figure i2-2. X^Bus Read Multipic 
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There can be muiiiple reads ouistandmc on the X-Bus from a sincJe CPU. In such a case, 
retuminc read dau is disuncuished by the sub-ID field. Sub-ID = xO ts used for the dau 
cache. Subid s xl is used for the instruction cache. 

The read addres? is sourced bv the CBA caie arrav. The CBD caie arrays provide the \ir- 
lual page offset uuhm segment (N'PN). When the read address is transferred, the CBA 
gate array captures the associated VPN for subsequent use dunng cache fill and DTS up- 
date. 



12.5.1.1 Read Initiation Bypass 

When a read MMU command is being decoded by the 6 IF and there are no prex'ious in- 
ternal requests pending, the arriving PA is immediately forwarded to the X-Bus outbound^ 
address register. If the BIF is the default bus owner, no external bus requests are pending, 
and internal request iniiiauon is not suspended for any reason, the read request is initiated 
in the following bus cycle. 

12.5.2 Read Data Return 

After the BIF initiates a bus read, it waits for the return of read data. Several outcomes 
are possible: data returns as expeaed. data returns in error, and data fails to return. 

The expected dau return is either one (READ) or more (READ MULTIPLE) data trans- 
fers identified as READ RESPONSES. The returning dau appears on the 64-bit bus 
aligned as if in memory. Byte 000, if present, is in bit posidons 63:56, and so on. If muUi- 
. pie READ RESPONSE cycles are expected, they are either immediately abutting or have 
intervening NOPs. If there are intervening NOPs, there is always at least 2 such NOPs, and 
ARB INHIBIT is asserted by the responder to prevent any intervening unrelated bus opera- 
tions. 

If bad dau is returned, the accompanying command code is READ RESPONSE ERROR. 
This may be caused by detecting an uncorrectable ECC or parity error. It may also occur 
because of a bus timeout or address error in the responding device. No further data is re- 
turned subsequent to a READ RESPONSE ERROR. A READ RESPONSE ERROR may 
occur in any cyde of a mutliple transfer read return bus sequence. 

The last possible outcome for a read is for the read data to fail to return. This can only 
happen in the presence of a hardware failure. 

12.5.3 Read Retiim Timeout 

The failure of read data to return is detected when BIF*s bus dmer expires while a read 
request remains outstanding on the bus. This is the same timer used in bus acqtiisition 
timeout. As mentioned earlier, the timer is started when any request is posted. If arbin- 
uon succeeds and a write or TB invalidate follows, the timer is stopped afler receiving 
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eiUier an ACK or a NOACK acknoweldce. il arbiirauon succeea* ana a reaa issue iallovis. 
ihe iimer \s cominuea. if the umer then expire; before ihe iasi read aaia return*' a read 
reium umeout occurs. If a umeoui occurs, me svsiem is assumeo oroKen anc a cjock 
ireeze request is nftaoe to me SCR. The interna! BIF siaie is preser\'ec as mucii possi- 
ble. 

If two reads are concurrently ouutandinc. the umer is restarted when read data return 
completes for each request. This results m a somewhat longer timeout for the second read 
request. 

If a second request tread, write, or T6 invalidate) is issued while a read is outstanding, the 
timer is not stopped. This results m a shorter bus acquisition timeout for these subsequent 
requests that expires coincidentlv uith the read dau return timeout* 

12.5.4 Read Return Minimum Hme 

The READ RESPONSE for a READ or READ MULTIPLE command must occur no 
sooner than the first cycle after the acknowledge cycle for the address transfer. This is also 
the mimmum urae possible wiihm the bus protocol (except for default bus owners k 

12.5.5 Read Return Acknowledge 

The BIF either successfully acknowledges, or error acknowledges, a READ RESPONSE 
addressed to it. If it error acknowledges, it forwards the returning data as if correct to the 
dau or instruction caches. The BIF records the error status in the embedded scan state 
and requests a dock freeze of the SCR. 



12.6 X-Bus Writes 

When the BIF wins the bus and decides that a write is the highest priority task, it transfers 
the write address and data, and sends a WRITE or a WRITE MULTIPLE command. The 
BIF issues a W'RITE command if the data to be uransferred is less than or equal to 32 bixs. 
The BIF issues a WRITE MULTIPLE command if the data to be transferred is 64 bits or 
more. 

If the request is a >^'RIT£, the data accompanies the address. The associated byte mask 
dedde- the exact request size. 

If the request is a WRITE MULTIPLE, the address and transfer direction are sent in the 
first cycle. Bit 32 is 0 if the address is ascending, and bit 32 is t if the address is descend- 
ing. The second and subsequent cycles uansmit 64 bits of dau accompanied by a WRITE 
DATA command. AH transfers begin and end on quadword boundaries. 
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12.6,1 X-Bus Write Multiple Limit 

The BIF cominuaily monitors lis internal write address and data queue to aeiermine tf the 
next viTiie data lo be transferred is an adiaceni address quadword. If so. u sustains the 
wnie muluple. To prevent excessive bus use by one processor, the BIF stops a n^tuc mutu- 
pie arbitrarily at ever\' 256 byte bounaary (32 transfers). Write muiuple data is always sent 
in immediaiely adiaceni bus cycles. 

The BIF does not generate odd iongword stan viTue multiples. 



12.6.2 X-Bus Initial Write Hold Off 

The BIF does not aaempt to transfer wnte data as soon as the request is posted. Rather* it 
delays the transfer, anticipaunc that subsequent \vntes to adjacent addresses are likely. The 
request is finally posted only if one of the following conditions is true: 

• If a second write to any address is queued. 

• If the pending wnte was not encacheable. 

• If the pending urite would unlock the bus, 

• If there is a pending data cache miss, which collides in address with the pending 
wme. 

• If there is a pending data cache miss that is unencacheable or would change the 
bus lock stattis. 

9 If the free running BIF counter overruns (safety measure). 

• If the write is really a TB invalidate. 

12.6.3 X-Bu5 Write Monitoring 

All X-Bus v-Tites are monitored even if they are not direaed to, or originated by. the local 
BIF. The BIF determines if a copy of the data at the write address has been locally 
cached. If so, the BIF schedules an invalidate of that cache entry. The BIF maintains du- 
plicate tag stores. 



12.6.4 X-Bus Writes To BIF CSRs 

When the BIF detects a 32-bii write into its own register range, it substitutes a WRITE 
MULTIPLE of 2 longwords for a WRITE command. 
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12.6.5 X-Bus Write Multiple Acknowledge - • 

The acknowiedce for me WRITE MULTIPLE commana is correci'oniv wiien me siave can 
accept ai least the first 04 bits of data. The acknowiedce tor the WRITE DATA command 
associated with a wrae muluple is busy if the associated bii5 ot data cannot oe accepieo 
and must be retransmitted. 

An error or no acknowledge for a WRITE DATA command is interpreted as a busy ac- 
knowledge to preserve state. V^'hen this is encountered, the acknowledge driver freezes the 
clocks. 



12.7 X-Bus Slave Response: CSR Access, Interrupt Posting 

The BIF holds 4 operationally available registers: ERRADDR, BUS^CSR. ICTRL and 
ISUM. These registers can be accessed over the X-Bus. in addition, the BIF posts mter- 
rupts to the local processor in response to bus wntes. The following addresses are those to 



which the BIF re^onds as a slave device: 


OOpp 0200 


Interrupt Summary Register (ISUM) 


OOpp 0205 


Interrupt Control Register (ICTRL) 


OOpp 0210 


Bus Control Register (BUS.CTRL) 


OOpp 0218 


Bus Error Address Register (ERRADDR) 


OOpp 0220 


Process Tuner (PROCJTIMER) 



OOpp 0100 - OOpp 013C Interrupt Posting Addresses 



NOTE: pp « 'Processor number 

12.7.1 X-Bus Slave Response: CSR Read Return 

The BIF decodes ail incoming read requests. If the address matches one aUotted to the 
interface, it recunis 32 bits of read data. The data is returned in bit positions 63 through 
32. The BIF sometimes delays register read data response so that the read dau is returned 
no sooner than the fourth cycle after the one that provided the read address. This is only 
necessary when the BIF is the default bus owner. 

The BIF gives a busy response when a second X-Bus read request arhves for a register 
which has an X-Bus read underway. Otherwise, it accepts all read requests. 

The BIF gives a no response when the read request is for anything other than 32 bits. 
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12.7.2 X-Bus Slave Response: CSR Write Accept. Interrupt Posting 

The filF decodes all incoming write requests and. if ihe address maiches one alloiied lu 
ihe inierface. acknowledges the request. 

If the address is one of the interrupt posunc locations, a WRITE command is expected. In 
this case, the dau and b>'te mask are not interpreted. 

If the address IS one of the accessible CSRs. a WRITE MULTIPLE command is expeaed. 
A request length of 1 or 2 longwords is expeaed unth the data provided in bit positions 63 
through 32 of the first \MUTE DATA command. This is necessary because of the posiuon* 
ing of the CSR registers in the CBA IC. 

The BIF gives a busy acknowledge when an X-Bus write request of any type amves for a 
register which has an X-Bus read underu*ay. 

The BIF gives an error acknowledge when it detects a panty error in a write data. A 
WRITE MULTIPLE to an interrupt posung address, or a simple WRITE directed at a CSR 
also generates an error acknowledgement. In either case, embedded state is set and a 
clock freeze request to the SCR generated. 



12.S X-Bus TB Invalidates 

The local processor can issue TB invalidates for broadcast over the X-Bus. The BIF ac- 
cepts, queues and delivers to the X-Bus TB invalidates as if they were writes. 

12.8.1 X-Bus TB Invaiidate Issuing 

The BIF transmits TB invalidate requests accompanied by the comands INVAL TB SEL 
and INVALIDATE TB. If the former command is issued, the address field holds the vir- 
tual page address of the entry to be invalidated. The vimial page number, address bits 31 
through 12. can be found on tfie bus in bit positions 63 through 44. 

Mo acknowledge is expected or awaited when a TB invatidate command is issued. 



63 44 32 

X^ue [a^ Virtual Page Number 12 l^^^^g^^ 



Figure i2-5. INVAUDATE TB Command Virtual Page Number field 

Th Virtual Page Number is transferred on X-Bus bits 63:44 during INVAL TB SEL and 
INVALIDATE TB commands. 
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I2.S.2 X-Bus TB Invalidate Accepting 

The BIF uncondiiionallv accent? all X-Bus TB invalidate requesis ana torwaras mem lo 
me MMU ihrouch me mvaiicaie queuemc mechanism. 



12.9 X-Bus Locking 

The BIF accepts load lock, load unlock and store unlock command from the MMU. 
When load lock completes successfully, that CPU can hold the bus lock until the CPU ex- 
plicitly releases the lock, or an error anses. Only one CPU at a time may hold the bus 
lock* That, in ttim« permits the construction of criucal code secuons in a multiple proces- 
sor environmem.. 

12.9.1 X-Bus Lock Acqtiisition and Release 

The BIF secures the bus lock only when a load lock data cache miss is successfully issued 
and acknowledged on the X_BUS. In more detail, first the data cache miss which seeks 
the bus lock is posted. This request pushes all previously queued writes ahead of itself. 
When the lock request is next to be serviced, the current state of the external bus lock 
signal is examined. If lock is already asserted by another CPU* the arbitrauon is deferred, 
if the bus lock is available, arbitration is attempted. If the bus lock signal is subsequently 
^ asserted before the BIF gams access to the X-Bus. the BIF withdraws from further arbitra- 
tion. When the bus is finally secured, the arb inhibit and lock signals are simultaneously 
assened. ARB INHIBIT remains asserted for 3 cycles. This is sufficient time for all other 
btis mterfiscss to see the lock signal assened and withdraw from arbitrauon if they too plan 
to secure the bus lock. At the end of 3 cycles, the locking BIF also examines the state of 
the acknowledge signals. If other than a successful acknowledge is detected, the bus lock is 
immediately released. If released, the lock signal is deassened at the end of the cycle fol- 
lowing the acknowledge. 

The BIF releases the bus lock when a load tmlock or a store unlock is successfully issued 
and acknowledged. Alternatively, the lock is released upon an error in the local processor. 
A local processor error results in a processor trap. The signal trap dispatch is. therefore, 
used to unconditionally release the bus lock. In more detail, the data cache read or wnte 
which seeks to release the btis lock is posted. This request pushes all previously queued 
writes ahead of itself. At the end of 3 cycles, the locking BIF examines the state of the 
acknowledge signals. If other than a successful acknowledge is detected, the bus lock is 
retained. Otherwise, the lock signal is deassened at the end of the cycle following the ac- 
knowledge. 

If the BIF rej cts a lock request (REJECT signal), the lock signal and ARB INHIBIT are 
immediately released. Similarly, if the BIF rejects an unlock request (REJECT signal), the 
lock is retained. 



- 149 - 



EP 0366 434 A2 



ApoUo Preliminary and Confidential 



12.9.2 X-Bus Lock Nesting 

The MMU can requesi the bus lock for PMAPE update whiie the BIF posesses me bus 
lock. For this reason, a second load iock requesi can be accepted. If two bus lock requests 
have been accepted, two bus unlocR requests need to follow before the lock ts actually re- 
leased. In effect, the BIF nests bus iock requests two levels. 

12.9.3 X-Bu5 Lock Duration Timeout 

The BIF starts a timer when the bus lock is first acquired. The timer remams running as 
long as the BIF holds the bus lock. If the umer expires before the lock is released, a iock 
timeout trap is posted. The umer durauon is approximately 200 microseconds (U-bit 
counter). 

The BUS_CSH register indicates when a umeout trap occurs. If a second iock setung re- 
quest IS processed before a held lock is released, the umer is not reset. This results in a 
shorter timeout for the second request. If an unlock request is being transferred upon the 
X^Bus, the BIF does not arbitrate for a new lock request for at least five cycles* inchiding 
the cransferring one. This delay assures that there are always be two cycles of delay be- 
tween the release of a lock and iu reacquisition by the same BIF. 

12.9.4 X-Bus Data Consistency Under Lock 

The BIF guarantees that, once a lock has been acquired, ail writes on the bus that pre- 
ceded the load lock transfer have successfully invalidated ihe cache. This is a natural out- 
come of an X--Bus READ command requiring at least 4 cycles before the READ RE* 
SPONSE command is seen. 



12.10 X-Bus Request Retry 

The BIF retries any request that receives a BUSY acknowledge. The retry continues until 
the bus timeout expires. 

If an address transfer receives a BUSY acknowledge, the request is marked as in reuy. 
There can be as many as three requests in retry at any one time. Retry requests receive no 
different priority ueatment, other than following retry holdoff . 

12.10.1 X-Bus Retry HoldofT 

If a request is in reuy. ii is not necessarily posted to the bus immediately. The retry inter- 
val is a random function over a bound that geometrically increases to a ma?fiT"^"" spread 
of 1.6 microseconds. The function is derived from the free rxmning BIF coimter. If multi- 
ple requests are in retry at once, they share the holdoff timing. 
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The minimum reauesi ssacinc for an immediate reiry is 5 cycles. Tliree c\xies make the 
original transfer and aviatt tiie acknowidce. One cvcie marks ihe request aj^ in retry. The 
last cycle rearbiiraies for the bus. " " 



12.11 X-Bus Reject 

Two successive bus address transfers may be issued by same the BIF in bus cycles spaced 
apart by only one NOP or foreign cycle. If the first request receives a busy acknowledge, 
the acknowledge is received only after the second request has been sent. In this case, the 
bus REJECT signal is immedtateiy assened. The REJECT signal is interpreted by the slave 
as nullifying the already accepted request. Using REJECT retains the order of transfers on 
the bus. This is important when the second request is a read for the same data that is be- 
mg wniten by the first request. 

>Mien REJECT is assened* the acknowledge for the second request is ignored. When RE* 
JECT is assened, all transaction side effects such as bus locking, do not uke place. 

12.11.1 X-Bus Write Order Assurance 

Using REJECT in cooperauon with the write, order assttrance of the write queue, guaran- 
tees that the write order of one CPU is always preserved, as seen by a second CPU- This 
permits some forms of multiprocessor synchronization, without needing bus locking. 

BB 
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Chapter 13 

Bus Interface Registers 



13,1 Interrupt Posting 

There are 16 interrupt posting longword addresses to which the BIF responds as a desuna* 
tion. The addresses are in subsequent longwords. 



Interrupt Posting Address (Write Only) OOpp 0100 to OOpp 01 3C 

31 00 



Data Not Interpreted 

PP a Processor Select Number 

00, 04. 08. OC 10. 14> 18. 1C 

20. 24 . 28. 20 30. 34. 38. 3C 



Figure IS-L Interrupt Posting Address Register 



Interrupts are always accepted by the processor to which they are directed. The interrupt 
originator receives no acknowledge. In effect* storing to an intemipt posting address simply 
requests an interrupt in the destination processor. There are 16 interrupt classes* The lower 
numbered interrupt posting address corresponds to the lower numbered interrupt class. 
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13.2 Interrupt Control Register 

Associated wah each imerrupunc aadress in a procesjor are boih an inierrupi enaoie and 
an inierrupi pend flags. These 2 bus are available in ihe mierrupi control recisier. IN- 
TCTL. The register should be read and uTUten only as a Ibncword quantity. 



Interrupt Control (INTCTL) (Reaa/Write) 0000 1208 

31 30 16 IS 00 



IENAB[1 4:001 



1PENDI15:00] 



IENA6 s interruDt enables for Interruot Classes 0 to 14 {Read, Write 1 to XOR) 
IPEND s interrupt Requests for interruot Classes 0 to 15 (Heao Only) 

Interrupt Class 1 5 is Always Enabled 



Figure 13^2. Interrupt Control Register 



The interrupt pend bit is set when a write to the associated interrupting address is de- 
tected. The pended interrupt causes a response when its speciHc interrupt enable bit is set 
and there is no comprehensive trap masking in effect. The highest priority enabled inter- 
rupt pend bit is cleared automatically when the processor reads the interrupt summary reg- 
ister. The corresponding interrupt enable bit is also cleared simultaneotisiy. 

The interrupt enable bits may be set and cleared directly by processor writes to the IN* 
TCTL register. Storing to the INTCTL register loads the interrupt enable portion of the 
register with the XOR of the current register contents and the store dau. This permits the 
needed selective updates of register contents. 



13.2.1 Non Maskable Interrapt 



Interrupt level 15 cannot be masked. 
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13.3 InterruprSummarj' Register 

The tmerrupt summan* recisier ideniifie5 ihe highest pnonty interrupt that i5 both pending 
and enabled. If no mierrupi is pending, ISUM<4:0> is set lo zero. The register should be 
read only as a iongword quaniuy. 

Interrupt Summary Register (ISUM) (Read Only) 0000 1200 

31 OS 04 03 00 



ISUM s Highest Interrupting Level Read On// 
I s 1 Enabled Interrupt Pending 

Reaaing Clears IPEND (ISUM) and lENAB (tSUM) in the INTCTL Register 

Figure /i-J. Interrupt Summary Register^ 



13.4 Bus Control/Status Register 

The Bus Control/Sutus Register (BUS.CSR) permits operational code access to the DTS 
force hit and miss functions. The BUS_CSR also captures the overail state of any software 
recoverable error detected by the BIF. The register should always be read and written only 
as a Iongword quandsy. 

The Hi and Ho bits force the duplicate instruction and data/operand tag stores to hit when 
a lookup for an X^Bus write is in progress. The Ml and MO bits force that lookup to miss. 
The operation, during which both the force hit and force miss bits for the same duplicate 
tag store are set, is undefined. 

The £n and £1 bits are the trap enables for bus write no response and btis lock umeout 
respecdvely. When either trap is pending, whether enabled or not, the corresponding W or 
L bit is also be set. The trap must be explicitly acknowledged in software by writing a 0 
into both W and L. Setting W or L nonzero while the associated trap is enabled, triggers 
an IP trap. Breaking a lock by trap^dispatch is not recorded as a lock timeout. 
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Bus Control/Status Register -Meaa/Wme} 

31 30 29 28 27 26 25 24 :: 21 18 17 16 15 



oqpo 1210 

OS 04 03 00 



jH: Ho|M- MojErj El C 



i NORESP !W L 



I 



iSUfVt 



. HI = 1 — 
HO = 1 — 
Ml s 1 

MO 1 -« 



Force Hit. orrs 

Force Hit. DOTS 
Force Miss. OITS 
Force Miss, DOTS 



Reaaf Write 
Reaa/ Write 
Read/ write 
Readi Write 



EN = 1 Enable Bus No Response Trap ReaW Write 

EL = 1 — Enable Lock Timeout Trap Read/Write 

C = 1 — Enable Process Timer Counting 

W a 1 Bus Write No Response Trap Pending Readi Write 

L = 1 Lock Timeout Trap Penaing Readi Write 

{. ISUM - Copy of the ISUM Register Read Oniy 
NORESP 



0000 No Address Captured I 1 , vr^o , • . 

1 -0 Reao Address Captured |_j - Write i to XOR {e.g.. to clear status) 

-1-0 Write Address Captured 

—10 Fetch Address Captured 

1 — 1 Read Address Captured. Subsequent No Response 

-1-1 Write Address Captured. Subsequent No Response 

— 1 1 Fetch Address Captured, Subsequent No Response 



Figure 13-^4. Bus Controi /Status Register 



The NORESP field indicates wliat address has been captured in the ERRADDR register. 
This field is usuaUy zero, except after a no response ack on the X-Bus. ^"hen this field 
becomes non--sero. whether by sofnvare action or because ii doesn't receive a btis re^- 
sponse. Che ERRADDR register ceases to dock. If mutliple faUures to respond have oc- 
curred, the LSB of the Held is set. The remaining bits and the ERRADDR reflect oniy the 
first failure. The lack of bus acknowledge results in either a uTite no response u^p from 
the BIF. or a trap from the MMU. The NORESP Held is zeroed by the u^p handler after 
the ERRADDR has been recovered. 
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13. S Bus Error Address 

The physical address of any read, write or fetch request thai receives no bus acknowetdce 
upon transfer is captured in the bus error address register, ERRADDR. The recisier begins 
clocking again only aher the software has cleared the KORESP field of the BUS^CSR. This 
field also associates the ERRAODR register contents wxh the transfer type. 



Bus Error Address Register (ERRAODR) (Read Only) OOOO 1218 

3t 3Q 29 02 01 00 



ERROR ADDRESS [29:02] 



Figure I3^S, Bus Error Address Register 

The captured error address may not correspond directly to the program requested address 
because of cache fill address zeroing, or write merging. 



PROCESS TIMER (PROCjnMER) (RBadWrfte} 000 1220 

31" 17 16 00 



Counts up, and interrupts on overflow into Sit 16. Btt 0 s 4 mics 

Figure 13^6. Process Timer Register 
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13.6 BIF Buried/Scan State 

Buried siaie, state readable, nnd wTitable unaer scan control oni\ . are provided in tne BIF. 
Some of the state is needed for funcuonal operation (that is. the board ID). Some of the 
siaie ts used to selecuveiy disable vanous accelerators tn the BIF. This laiier state is used 
tor diagnosiic assistance. 

13.6.1 Board ID 

There is a four-bii board identifier field. BD^ID (3:0). in the scan ring. The field is used 
for slave address decoding and read address source ID. The lower two bits also decide on 
which Class fi arbitrauon level the is IC is operating. This field is only m the CBA gate 
arrav. 

13.6.2 Arbitration Level 

There is a two-bit arbitrauon level field. ARB^LEVEL (1:0^, m the scan ring. The field 
should be set to the same value as BD_ID (1:0). It is used to decide on which Class B 
arbitration level the IC is operating. This field is in the CBD gate arrays. 

13.6.3 Write Multiple Inhibit 

There is a one-bit ^HIITE^MULTIPLEJNHIBIT bit in the scan ring. N\Tien seu the BIF 
does not generate uTite multiples other than quadwrites. This field is only in the CBA gate 
array. 

13.6.4 Write Merge Inliibit 

There is a one-bit WRTTE^MERGE JNHIBIT bit in the scan ring. When set. the BIF does 
not generate wnte multiples other than quadwrites. This field is only in the CBA gate ar- 
ray. 

13.6.5 Read Before Write Inhibit 

There is a one-bit READ^BEFORE^WTUTE^INHIBIT bit in the scan ring. When set. the 
BIF does not permit dau cache reads to precede data cache writes. This field is only in 
the CBA gate array. 

13.6.6 \^Tite Holdoff Inhibit 

There is a one-bit WRITE^HOLDOFF^INHIBIT bit in the scan ring. N^Tien set. the BIF 
issues queued writes as soon as possible. This field is only in the CBA gate array. 
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13.6.7 Instruction Cache Parity Inhibii 

There is a one-bii N0JCACHE_P.AJIIT\' bii in the scan ring. When sei. ihe BIF never 
checks instruction cache data pamv. TOis field is oniv m the CBD gate arrays, 

13.6.8 Data Cache Parity Inhibh 

There is a one-bit NO^DCACH E_P A RITS' bit m the scan nng. When set. the BIF never 
checks dau cache data pamy. This field is only in the CBD gate arrays. 

13.6.9 DTS Parity Inhibit 

There is a one-bit .NO_DTS_PARITY bit in the scan ring. When set, the BIF never cheeks 
pamy in the DITS or DOTS. This field is only m the CBA gate array. 

13.6.10 Force Parity Sense 

There are two FORCE_PARITY (1:0) bits in the scan ring. When zero, the BIF generates 
normal pamy. W^en nonzero, the BIF forces all output parity to Ones or Zeros in the 
DITS. DOTS, and the instruction and data caches. FORCE^PARITY = 10 generates Zeros. 
FORCE^PARITY = 1 1 generates Ones, 

This field is present in both the CBA and CBD gate arrays. The CBA Held controls simul- 
taneously both the DITS and DOTS parity. The CBD field controls both the instruction 
cache data and dau cache data parity. 

13.6.11 DTS Parity Eiror 

There is a one-bit DTS^PARITY_ERR bit in the scan ring. It's set when a DTS parity er- 
ror is detected and remains set tmtxl cleared under scan control. When set, the BIF signals 
the clocks to stop. This bit is only in the CBA gate array. 

13.6.12 Instruction Cache Parity Error 

There is a one-bit INST_PARITV_ERR bit in the scan ring. It's set when an instrxiccton 
cache dau parity error is deteaed and remains set umii cleared under scan control. When 
$et« the BIF signals the docks to stop. This bit is only in the CBD gate array. 

13.6.13 Data Cache Parity Error 

There is a one-bit DATA^PARITY^ERR bit in the scan ring. It's set when a dau cache- 
dau parity error is detected and remains set until cleared under scan control. When set, 
th BIF signals the clocks to stop. This bit is only in the CBD gate array. 



- 158 - 



Bus Interface Registers 13-7 



EP 0 366 434 A2 



ApoUo Preiiminary and Confidential 



13.6.14 X-BUS Overlap Control 

Tliere is a one-bit ONE_ATATIME bu in the scan nnc. When set. the BIF doe.< not issue 
a second X-Bu$ reierence beiore me lasi fuiiv complete. For a write, ihis means a suc- 
cessful ACK. For a read, thus mean*- a successful read data return. This held oniv in ihe 
CBA gate array. 

13.6.15 Retrj- Backoff Inhibit 

There is a one-bii NO_BACKOFF bit in the scan ring. N^'hen seu the BIF reissues reir%' 
requests as soon as possible. This field is only in the CB.^ gate array. 

13.6.16 Read Response Error 

There is a READ_RESPONSE_ERROR bh in the scan ring. li*s set when the BIF accepts a 
READ RESPONSE which triggers an error acknowledge. Typically, this would be a pamy 
error. The bit remains set until cleared under scan conuol. When set. the BIF signals the 
clocks to stop. This field is only in the CBD gate arrays. 

13.6.17 Art>itration Timeout 

There is an ARBJTIMEOUT bit in the scan ring. It's set when the BIF's arbitrauon timer 
elapses before acquiring the X-Bus. The bit remains set until cleared under scan control. 
When set. the BIF signals the cloclcs to stop. This field is only in the CBA gate array. 

13.6.18 Read Return Timeout 

There is a READ^RETURNJTIMEOUT bit in the scan ring. It*s set when the BIF's read 
return timer elapses before an expeaed READ RESPONSE arrives. The bit remains set 
until cleared under scan control. When set, the BIF signals the clocks to stop. This field is 
only in the CBA gate array. 

13.6.19 Error Acknowledge 

There is an ERROR^ACKNOWLEDGE in the scan ring. It's set when the BIF receives act 
error acknowledgement to an address transfer. It's also set when a no acknowledge re- 
sponse to a data transfer cycle of a write multiple occurs. The bit remains set imdl cleared 
under scan control. This bit does not request clock stopping. This field is only in the CBA 
gate array. 
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13.6.20 DTS RAM Diagnostic Address Generation 

There is a one-bii DTS^DIAGADDR bu in ihc scan ring. When sei. the BIF CfiA'gencr- 
aies mcreasinc DTSINDEX addresses.- These addresses are used for the DTS ana prnnan* 
cache RAM seifiests. This bu is oniy m the C6A caie array. 

13.6.21 DTS Diagnostic Data Generation Control 

There is a one-bii DTS^DATALD bit in the scan ring. Ii is used lo control the source of 
data for uTKing and comparison during the DTS selftest. This bit is only in the CBA gate 
array. 

13.6.22 DTS Diagnostic Data Writing Control 

There is a one-bit DTS^DIAGWE bit in the scan ring. N\Tien set. diagnosuc data is wnitcn 
imo the DTS RAMs during every cycle. This bit is only in the CBA gate array. 

13.6.23 DTS Diagnostic Error 

There is a one-bii DTSJTESTERR bit in the scan ring, li is set if there is a miscompare 
during the DTS RAM selftest. This bit is oniy in the CBA gate array. 

13.6.24 Cache Diagnostic Data Generation Control 

There is a one-bit CACH£_D.^TALD bit in the scan ring. It is used to control the source 
of data for wming and comparing during the cache data selftest. This bit is in the CBD 
gate arrays. 

13.6.25 Cache Diagnostic Data Writing Control 

There is a one-bit CACHE^OIAGWE bit in the scan ring. When set, diagnosuc data is 
wrinen into the cache data RAMs during every cycle. This bit is in the CBD gate array's. 

13.6.26 Cache Diagnostic Error 

There is a one-bit CACHE JTESTERR bit in the scan ring. It is set if there is a miscom- 
pare during the selftest of the cache dau and parity RAMs. This bit is in the CBD gate 
arrays. 
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13.7 IP Trapping 

A three-bii trap code is sent from tne BIF to cne IP. There are oniv five useful codes de- 
rived from tnese ihree bus. BIF_ERROR is euner a ^^Tue bus no response acknowledge or 
lock umeom. The BUS_CSR musi be read lo determine which is ihe case. 

HIK; trap REOf2:0i 

000 No Requesi 

001 BIF Error 

010 Interrupt 

011 BIF Error/Interrupt 
1_ NMI 

Whenever the IP initiates a trap sequence, the signal IPJTRAP^DISP is assened. Assening 
this signal unconditionaiiy releases the bus lock. 

B 
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Chapter 14 

Invalidate Pipeline 



14.1 Duplicate Tag Stores 

The Duplicate Tag Siore (DTS) is a copy of the CPU*s Instruction and Operand Cache 
Tag Siore which is used to compare addresses being modified on the X-Bus against the 
contents of the caches. If a match between a location being modified on the X-Bus and 
DTS entry is found, that entry is invalidated in the corresponding cache. Performmg this 
operation without the DTS wastes many cycles in the caches to compare the cache tags 
against X-Bus memory modify transactions. 

The duplicate instruction tag store is referred to as DIT5. The duplicate date or operand 
tag store is referred to as DOTS« 



14.1.1 DTS Addressing . . 

The DTS is the principal caches with lartual. addresses. The X-Bus deals only with physical 
addresses. The virtual address of a transaction is formed by using the 12 LSBs of the 
physical address that are the same as the 12 LSBs of the virtual address and concatenating 
them with enough of the virtual address to index the cache. For the CPU*s 128 KB in- 
struction cache, 5 virtual bits are required. For the CPU*s 64 KB data cache, 4 virtual bits 
are required. These bits accon^any the physical address on the X->Bus. 
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Duoiicaie *ac Store «naex 



ivie Aaaress Wnnm a Page 
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A 



Virtual Address 
(VPN) 



Physical Address 



Byte Select 

(Not Used to 
inaex OTS) 



Figure /--y Dupiicatc Tap Srorc Addressing 

Dupiicaie Tag Score Addressing — Bits 16 through 3 are used to address the Duplicate Tag 
Score. Bits 16 through 12 are uken from che VPN of the X-Bus cransaccion. Bics 11 
through 3 are taken from the physical address.- One less bit is required to address the Du- 
plicate Operand Cache Store than the Duplicate Instruction Cache Store. Only 13 bits are 
used to address the DOTS (Bit 16 is tied to a fbced value). 

DITS and DOTS are commoniv addressed. 



14.1.2 OTS Contents 



Each DTS enuy contains two fields: an iS-bii physical tag and a 1-bii panty check bit. 
These fields are shown in Figure 14-Z. 

The physical tag is the 18*bit physical page number which;, along with a 12«-bi^mdex< ad*^ 
dresses 1 gigabyte (30 bics) of physical address space. The paht>' bit is an odd parity check 
bit. 



There is no explicit valid bit. In invalid entry points to an unlikely memory locauon (0). 
Example: 

physical tag = 000000000000000000 
parity bit = 1 
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Physical Page Numoer 

Parity Check Bit 



Figure 1 4^-2 . Duplicate Tag Store Contents 

Duplicate Tag Siore Contents — The Duplicate Tag Stores contain an lS*bit physical page 
number and a parity check bit. 



14.2 DTS Functional Overview 

Duplicate Tag Store operations can be divided into the following catagories: 



• DTS lookup 

• DTS hit 

• DTS allocate from processor write 
e DTS allocate from read response 

The DTS acts as an imperfect filter for cache invalidates. Any time some other system de- 
vice (including another CPU) modifies a memory locadon, the DTS is checked to see if 
that location is currently resident in either of the CFU^s caches. If it is present* a cache 
cycle is stolen from the cache that contains that location.' The entry in the cache and the 
entry in the DTS are invalidated. The DTS may actually have labelled entries, which are 
not valid in the caches, as valid. This generates a needless cache invalidate cycle. 

The DTS is updated in two separate situations, similiar to the main caches. The first is 
when the CPU modifies a location by executing a STORE operadon. The second is when a 
cache miss is generated and the data returns on the X-Bus. 
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Figure 14-3. Basic Duplicate Tag Store Data Paths 



14.3 DTS Xookiip 



A joim lookup of the DITS and DOTS is performed whenever the following transactions 
are detected on the X-Bus: 



• WRITE from another device ~ 

• WRITE MULT followed by WRITE DATA from another device 

A lookup only of the DITS is performed whenever the following transactions are detected 
on the X-Bus: 



• WRITE from this CPU 

• ^^TUTE MULT foUowed by WRITE DATA from this CPU 
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i he DTS looKup is oasicailv handed in ihree pipeline siaces. The following stage? are 
slaved to the operation ot the X*Bus: 

• COMMAND DECODE 

• DTS ACCESS 

• TAG COMPARE 

14.3.1 DTS Lookup: Write 

The CMD field is decoded during the first cycle after the X-Bus bus write transacuon. If a 
WRITE operauon is decoded, the address to be used as a DTS index is loaded into the 
DTS INDEX register. During the next cycle, the DITS is accessed in a read operauon. The 
DOTS is opuonally accessed. The tags are compared, as required, to the physical page 
number. If the PPN and DTS tag match, a cache entry invalidate and a DTS entry m- 
vvalidate are scheduled. 
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Figure 14^, DTS Lookup Pipeiine Schedute for WRITE or WRITE UNLOCK 

Cycle 1 A WRITE transaction on bus. The transaction is loaded into the BIF*s 

X-Bus input registers. 

Cycle 2 The command is decoded. If it is a WRITE, the DTS index register is 

loaded from the physical address and the VPN. The physical address 
is piped forward for the tag compare (s). 

Cyde 3 A DTS read access takes place. The tag is compared to the physical 

address. If a match occurs, a cache entry invalidate and a DTS entry 
invalidate are scheduled. 

14.3.2 DTS Lookup: Write Multiple 

If the command is decoded and determined to be a N^'RITE MULTIPLE transaction, the 
address is stored in the DTS index. During the following cycle, when the corresponding 
WRITE MULTIPLE DATA is decoded, the first lo kup is opd naUy done if the WRITE 
MULTIPLE began on an odd longw rd botmdary. Otherwis , the address is held in the 
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DTSINDEX. Thereauer. ihe DTSINDEX if ioaocsd wiih ii? lormer conieni5. plus or minuf 
S bvtes (dependinc on wbether tne WRITE Ml'LTTPLE warascenamc or aescenamei. 
amicipaimc me nexi WRITE MLXTIPLE DATA cvcic. 
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Figure 14^5, OTS Lookuo Pipeiine Schedule for WRITE MULT 
with Two Data Transfer Cycies 

A WRITE MULTIPLE (WM) iransaction on bus. The transacuon is 
loaded into the BIF's X-Bus input registers. 

The command is decoded. If it is a WRITE MULTIPLE, the address 
used to mdex the DTS is loaded into the DTSINDEX register. At this 
ume, the first quadword of the WRITE MULTIPLE DATA is on the 
X-Bus tWDl), 

WRITE MULTIPLE DATA is decoded, and the address in the 
DTSINDEX is optionaiiy incremented or decremented by 4 bytes. The 
optional odd long\i*ord (WDo) lookup occurs. If a match occurs, cache 
entr^' invalidate and DTS entr>' are scheduled. 

A DTS read access takes place for WDl. The tag is compared to the 
physical address. If a match occurs, cache entry invalidate and DTS 
entry are scheduled. 

A DTS read access takes place for WD2. The tag is compared to the 
physical address. If a match occurs cache entr\' mvaiidate and DTS 
entry are scheduled. 



Cycle 1 
Cycle 2 

Cycle 3 

Cycle 4 
Cycle 5 
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14.3.3 DTS Lookup Hit Processing 

VXTien a memory modify operauon by anomer aevice causes a nil m eimer DTS. or a lo- 
callv generated \^Tue hus m me DITS. iwo events are scneduled. The event invalidates the 
enirv (or entries) which caused the hit in the mam cache. The second event mvaiidaies the 
emn* (or entries l m ihe DTS to make it consisiam unih the main caches. When modilyinc 
a memory locauon that is also m the local caches, it usually takes six cycles for a WRITE 
10 proceed from the X-Bus to that encrx* being invaiidaied. 

• Transaction on X*Biis 

• Command decoded 

• DTS accessed 

• PA bus arbitration 

• PA BUS/EASRC/PCSRC transfer 

• Cache tag MTite(s) 

The DTS entry invalidate is placed in a queue awaiting a free DTS cycle. Once a hit has 
been detected, the hicdng index is loaded into the address register of the cache corre- 
sponding to the DTS that contains the hit. Dtiring the following cycle, the DTS lookup is 
used to complete the address compare. It requests tise of the PA bus dunnc the following 
cycle. The PA bus is always available except when the DTS invalidate pipeline is pre- 
empted by a READ RESPONSE operation filling a cache miss (discussed later). The cycle 
following PA arbitrauon of the index is driven off the BIF address chip, and MMU enables 
the drivers to either the PCSRC bus or the EASRC bus (or both) . An index hitting in the 
DITS makes it's way to the PC register while one hicdng in the DOTS mtist be loaded into 
the £A register. An index hitting m both the DITS and DOTS is loaded into both EA and 
PC regiscers. 
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Figure 14-^6. Cache Invalidate Datapaths (not ail bus sources are shown) 
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Figure /-r'-r. DTS Hit WV>/j Cacnc Entry JnvaJidate and Delayed Dts Entry Invalidate 

Cycie 1 A WRITE (W) transaction on bus. The transaction is loaded into the 

BIF's X-Bus input registers. 

Cycle 2 The command is decoded. The physical address is piped fon^'ard for 

the tag compare. The vinual index is loaded into the DTS index regis- 
ter- 



Cycle3 A read operation is performed on the DTS. 

Cycie 4 The results of the tag compare are available. Since there u>*as a hit. the 

PASRC bus is requested. The DTS entry invalidate(s) are queued for 
execution when DTS is available. 

Cycle 5 The virtual index of the location to be invalidated is passed via the 

PASRC bus to the appropriate cache address register. 

Cycie 6 The cache entry causing the DTS hit is invalidated. 



14.4 DTS Allocate from Processor Writes 

When the CPU modiHes an operand cache location via a store instruction, the DOTS must 
also be updated to reflea the 'cache's new sute. The update occurs after the transaction is 
placed on the X-Bus. This avoids DTS conflicts by using the X-Bus as a synchronization 
point for DTS access. Only one device can use the X*Bus at a time and that device has to 
arbitrate to obtain the bus. The only DTS operanons that are not synchronized through the 
X-Bus are the DTS entry invalidates. They are lower priority than the other DTS opera- 
uons. 
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14.4.1 DTS Allocate: Write 

The BIF address chip decodes a WRITE operaiion ii has generaied-wi ihe X-Bus. li 
vkTites ihe new lac imo the DOTS while dome a lookup into the DITS dunnc the follou'inc 
cycles. A hit occurmg in the DITS at tnis point indicates that the processor is modifymc a 
locauon thai has been cached in the insirucuon cache. An insirucuon cache entry im^aii- 
date and a DITS entry m\'alidate are scheduled. 

While the DTS urite allocate occurs, the DTS index must be compared against every mdex 
in the DTS entry invalidate queue that is scheduled to invalidate an entr>' m the DOTS. If 
any of the compares succeed, that DTS encn' invalidate must be invalidated. If the invali- 
date was scheduled for both the DITS and DOTS, it is retagged as being only for the 
DITS. In this way. an old pendmg DOTS entry invalidate won't destroy a recently allo- 
cated entry. 

14.4.2 DTS Allocate: Write Multiple 

A WRITE MULTIPLE from the CPU is u-eated just like a WRITE MULTIPLE from an- 
other device. The only difference is that the DOTS is wnttcn into with the physical tag. 
rather than read and checked for tag match. 
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Figure J 4^8, DTS Allocate From Processor Write 

Processor write is placed on X-Bus from WRITE BUFFER. 

The write is decoded and also determined to be from the same CPU. 

The DOTS is updated with the new physical tag and the valid bit is 
set. The DITS is checked for a tag compare and. if a hit occurs, the 
instnictxon cache entry invalidate and DITS entry invalidate are sched* 
uled in the tisual way. 



Cycle 1 
Cycle 2 
Cycle 3 
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14*5 DTS Allocate from Read Response 

The DTS is aiso uTitien when a READ RESPONSE reiurn? m reply to a READ MULTI- 
PLE made by the same CPU. Wlnen a cacheabie miss occurs m a cache, a READ MULTI- 
PLE request is sem to mam memor>'. Mam memon- returns che requesied data m the form 
of successive. READ RESPONSES. Upon decoding ihe expected READ RESPONSE com- 
mand, ihe BIF sends the associated tae to the waiung cache and enters the tag mto the 
DTS usmg the convenuonal DTS pipeline. No tag comparison is performed during this DTS 
cycle, and only che DTS corresponding to the cache chat missed is updated. 
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Figure 14^9. DTS Index Increment /Decrement Datapaths 



Three sets of addresses must be stored and manipulated when addressing the DTS. The 
DTS index register is used when processing WHITE MULTIPLES. Two other registers are 
used to hold the addressees associated viith two possible pending cache miss READ RE* 
SPONSEs. - - 
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Figure 14^10. READ MULTIPLE Request 
and READ RESPONSE Scenario with DTS Upaate 



Cycle 1 



A cache miss causes the BIF to place a READ MULTIPLE request on 
the X-Bus. 



Cycle 1 



Cycle 3.. .N-1 
Cycle N 

Cycle N+1 



Cycle N+2 



The command is decoded and detennined to be a self-generated 
READ MULTIPLE. The VPN and physical address are stored in the 
appropriate pending operation holding register. Which pending opera- 
tion holding register depends on the X-Bus SUBID signaling whether it 
is an instruction or operand cache miss. 

The memory subsystem is processing the READ MULTIPLE. 

The memory subsystem places the first of two READ RESPONSE 
transactions on the X-Bus. 

The second READ RESPONSE is on the X-Bus. The first READ RE- 
SPONSE is decoded and the corresponding address is loaded from the 
holding register to the DTS index. The holding register is then loaded 
with its contents ±: 8 bytes, depending on the ordering for that type of 
operation. (I-miss or D-miss). 

The first READ RESPONSE is updating the DTS. The second READ 
RESPONSE is decoded, the contents of the holding register are again 
transferred to the DTS index register, and the holding register is 
stepped (± 8 bytes). 



Cycle N+3 



The second READ RESPONSE updates the DTS. 
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Chapter 15 

Write Pipeline 



15.1 Write Buffer Overview 

The write buffer serves iwo purposes. First, it isolates the processor from inemor\' and bus 
latencies during stores. Second, it reduces overall bus traffic. 

The write buffer isolates the processor from memory and bus latencies by offering a hiah 
banduidth FIFO quieue for store operations. The processor can submit many back-io-back 
stores and conunue functioning while this queue x.s empued, through the X*Bus, into mem- 
ory as both become available. 

The write buffer serves to reduce bus traffic by collapsing and groupmg small, adjacent 
writes into large single blocks which make beuer use of the X*Bus and mam memory re* 
sources. 
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Processor 



Figure IS^l. The Write Buffer 
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15.1.1 FIFO Organization 

The wnie buffer is phvsicaih spin across ihe CBA and CBD gate arrays. The CBA holds 
:ne address portion ol ihe queue and the CBD holds the associated data. There are 64 
data bits associated y^iih everv queue address. 

The queue is structured as a variable depth FIFO. Entries are added to the bottom of the 
queue and removed from the top. The top of the queue is always at a fuced point. The 
bottom of tlie queue varies dependmg on the current number of queue entries. 

There are address comparators at every queue entry. These comparators are used to de* 
cide whether newly amving wTite data may be merged with the current queue contents. 
This wnte compaction reduces bus and memory bandwidth requirements. The address com- 
parator is also used to permit reads to bypass writes. The address comparators mdicaie any 
read/wnte address collisions that would prevent the b\pass. 
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Figure 15-2. Write Buffer Pipeline 



The WRITE BUFFER pipeline shows data and addresses flowing from the processor to the 
X-BUS, sometimes by way of a FIFO queue; 

Queue entries are not unloaded tmtii a successful X-Bus acknowledge is seen. Transmit 
bypass is used when a second (or successive) X-Bus write is initiated before the Hrst ac- 
knowledge is receiv d. Transmit bypass selects the first untransmitted queue entry as the 
next address or data to send. 



15-2 Write Pipeline 
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15-2 Write Address/Data Staging 

The processor store aata is captured from me cache DATA bus during me store's access 
stage. T\picaih\ me address loilows in the next cycie on me PA bus. If the PA bus is not 
available in that cycie, or there is a processor EVALID stall in effect, the data \s held in 
place by the MMU deassenmg the MMU_HDATA_LD signal. 

There are two inbound data staging registers and one address staging register before the 
wnie queue proper. (See Figure 15-2.) One dau staging register is used to compensate for 
the early dau arrival. The other data staging register, and the address staging register, are 
used to allow the address comparisons to take place and control the load enables m the 
queue. The address comparisons detcrmme whether the store data rnay be merged with 
data already present. 



15.3 Write Queue Contents 

In addition to holding the data, each CBD data queue has a MSHALF_\'ALID and 
LSHALF^VALID flag. The valid bits are used to determine whether there are any con- 
tents in the entry. LSHALF^VALID and MSHALF^VALID are also used to control the 
output wmc rotation needed for a 32-bii (or smaller) write to an even longword address. 
There is a NOSWAP flag that defeats the output wnte rotation in case the MMU has al- 
ready rotated the data property. If MSHALF^VALID and LSHALF^VALID are both 
valid, a ^S" is sourced with correct parity during the address phase of a write multiple 
transfer. 

Table IS^L MSJ/AUD, LSJ/AUD, NOJSWAP Decoding 



MS.VAUD 


LS.VALID 


NO.SWAP 




0 


0 




Empty 


1 


0 


0 


Even Long 


1 


0 


0 


Even Long - MMU 


0 


1 




Odd Long 


1 


1 




Quad 



In addition to holding the address, the CBA address queue holds 4 BYTEJ/ALID bits and 
the MSHALF^VALID and LSHALF^VALID flags. LSHALF_,VALID is almost address bit 
2. The four BYTE VALID bits correspond to the 4-bit byte mask required for a 32-bit 
bus write. The CBA sources these onto the X-Bus during the address phase of a write or 
write multiple. There is no need for the NOJSWAP bit. 
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Tabie yi-2. MSJ'AUD, LSyAUD, B)TEJ'AUD Decocunc 



MS^VALID 


LS.N ALIO 


B\TE VALID 




0 


0 




Empty 1 


1 


0 


BBBB 


Even Long 


0 


1 


BBBB 


Odd Long 


1 


I 




Quad 



The CBA IC also has other flags that control internal arbitration and write compacuon. 
There are NOCACHE, UNLOCK. LNTVTLBALL. and IN\TXBE flags associated with each 
address. Any of these flags being set inhibits wnte compacuon and read around wme. UN* 
LOCK releases the bus lock if the nesung level is 0 and this CBA holds the lock. The in- 
validate TB flags force the selection of the T6 invalidate bus command. 



15.4 Write Queue Loading 

Unless the queue is full, processor stores are accepted and added to the queued data uith- 
out stalling the CPU. Typically, the store's data and address are added simultaneously to 
the bottom of the address and data queues. The position of the queue's bottom is deter* 
mmed by the Hrst empty queue entry (measured from the queue's top). The affiliated flags 
are set. 

15.4.1 Load Merge 

If cacheable store data is being added to the queue, and the last valid entry in the queue 
is also cacheable anct agrees in the quadword address* the load data is merged into that 
entry. The merging logically ORs the valid bits. The merging happens if the data to load is 
a longword or quadword quantity. The merging is permitted if the data to load is a byte 
(or word) in length. The merging is allowed if the queue entry is already a quadword* or if 
the merge result does not spill over into the second longword. 

15.4.2 Write Buffer FtiU 

When the last entry in the write queue is occupied, and the inbotmd data address register 
is occupied or about to be occupied (MEM_CMD is requesting the use), the signal 
WBUF^FULL is sent to the MMU to prevent any further stores from advancing. If there 
is a store currently in its cache access stage cache, that store's data is captured and held! 
but freezes in its EXC stage. The signal WBUF_FULL is deasserted the next time the write 
queue advances. 
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15.5 Write Queue Unloading 

The queue entries are unioadea cunnc the cvcie auer rezeivnnc a successlui acknowiedce 
!ar me address or data iransier on tne X-Bus. li retry is required, ine address.- caia is sttil 
available in the wme queue. 

Wriie addresses are always taken from the write address queue. Only reads use the fast 
pass address paths from the MMU. The fast pass paths are for quick posttnc of read miss 
addresses in the event of default bus ownership. 

15.5.1 Transmit Bjpass 

The address or data to send on the X^Bus is normally at the top of the queue. If. how* 
ever, the top entry m the queue has been transmitted but not acknowledged, the next^to- 
top entry in the queue is used. Dunng write muitipies, queue data is being transmitted 
ever%' cycle. Since the queue must be accessed during the cycle before the X-Bus transmis* 
sion. and the queue unload occurs in the third cycle after the X-Bus transmission. 4 levels 
of transmit data bypassing are required. The four levels of bypassing allow reaching back to 
the fifth queue muy from the top. This is shown in Figure 15-3. 



ACCESS 


01 1 


02 


03 


04 


05 




TRANSMIT 




01 


02 


03 


04 




PEND 






D1 


02 


03 




ACK 








01 


02 




UNLOAD 










01 





Figure J 5*3. Transmit Bypassing 

An additional level of transmit bypassing is provided in the address queue output delivery 
This allows a level of address look-ahead that enables early detection of wnie multiples. 
The write mtiltiple gets ahead when the first X-Bus cycle transmits only an address (no 
data). This one cycle gap is enough to let the address transmit bypass pass ahead of the 
data by one cycle. 

Transmit bypass requires a SENT flag associated with the top 3 data and top 4 address 
queue entries. A queue entry is bypassed if it is already sent, or the queue element in 
front of it is already sent and there is another transfer on the bus at the time. 
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15.5.2 Transmit Retr}* 

If a daia or address X-Bus transfer receives an error* or hu$\ acknowledge, all queue ele- 
meni sent bits are reset. The requesis are retried. The REJECT signal may also be as- 
serted. 

15.5.3 Write Multiple Collapse 

If ihe next address lo send is for a quadword, a WRITE MULTIPLE command is sent. 
While the address is being iransmiued on the X-Bus. ihe next queue addressed is checked 
to see if it's also a quadword. and in an adjacent quadword. 

The adjacency darecuon is determined by the write queue when ii examines the lower or- 
der bits of the next two addresses to transmit. 

Write multiples are arbitrarily broken up on 256-byte boundaries to prevent any device 
from holding the bus for extended penods of lime. 



15.6 Read Around Write 

If an instruction cache read is posted, the read can pass around previously queued writes. 

If a data cache read is posted* the read can pass around previously queued writes only if 
the address doesn't collide with a pending write. The write queue detects this address colli- 
sion and reports it to the internal BIF arbitration logic. 

15.7 Write Parity 

Parity for both address and dau is regenerated just before X*Bus transmission. 

B3 



15-6 WHte Pipeline 
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Chapter 16 

Data Cache Interface 



This chapter descnbes the CPU to X-Bus data cache interface. 



16-1 Data Cache Read Miss 

Processor operand loads are usually sausfied by the data cache. A data cache read miss 
occurs when the data cache does not have the requested item. A cache read miss also oc- 
curs when the read request must' be forwarded to the bus regardless of whether cached 
data is available. Typical of this latter situation is a read from an I/O control resister. 

Cache miss processing is the joint responsibility of the BIF and the MMU. The BIF 
sotirces the fill address and informs the MMU as the data RAMs are written. 

16.1.1 MMU Request to the BIF 

The MMU provides the read's 30-bit physical address on the PA bus. The MMU com- 
mand accompanies the physical address. 

The read's virtual page offset within segment (N'PN) bits are presented before the physical 
address and command. Typically, the BIF captures the 7 bits from the external EA register 
during every cycle. If a read miss occurs, the physical address and command arrive in the 
following cycle. If, however, the PA bus is not available in the following cycle, the MMU 
asserts the signal MMU.HOLD_DVPN. The BIF holds the capmrcd data cache VPN. 
MMU_HOLD_DVPN is""dcasser.td during the cycle in which the physical address and 
command are finally sent to the 3IF. 
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The commanos mat appiy lo aata cacne miss are summanzea in Table 16-1. The snaoed 
areas do noi appiy lo reaa misses. 

Tabic Data Cache Read Mas Command Codes 



MEM_CMD14:01 



ooooo 


NOP 1 


10000 


siore.nolock. cache. 1 


00001 


load. nolock. cache. 16 


10001 


store.nolock.cache . 2 


00010 


fetch, nolock. cache . 3 2 


10010 


store.nolock.cache.4 


00011 


load.nolock.cache. 64 


10011 


store.nolock.cache. 8 


00100 


load.noiock.nocache. 1 


10100 


store.nolock.nocache. 1 


00101 


load. nolock. nocache. 2 


10101 


storc.nolock.nocache .2 


00110 


load.noiocic.nocache.4 


10110 


score«nQlQckaiocache.4 


00111 


Ioad.noioqk.nocache.8 


10111 


store.nolock.nocache. 8 


01000 


load.lock.nocache. 1 


11000 


TB invalidate single 


01001 


load . lock . noca che . 2 


UOOl 


TB invalidate all 


OlOlO 


load.lock.nocache. 4 


11010 


tnmu j?tore .iiniock.nocache.4 


01011 


load.lock«nocache.S 


11011 


unassigned 


01100 


load.uniock.nocache. 1 


11100 


score.unloclcnocache. 1 


01101 


load.unlock.nocache.2 


11101 


store.unlock.nocache . 2 


OHIO 


load.unlock.nocache.4 


11110 


siore.uniodcnocache.4 


01111 


load.uniock.nocache.8 


mil 


siore.uniock.nocaehe.8 



16.1.2 Cacfaeable Data Read Miss 

In the typical data cache miss, the MEM_CMD(4:0) field is either 00001, 
LOAD.NOLOCK.CACHE.16. or the field is 00011. LOAD.NOLOCK. CACHE. 64. The first 
command requests a cache fill of 16 bytes. The second command requests a cache fill of 
64 bytes. This second command is issued only if the cache miss is triggered by a 64-bii 
floating-point load at an address boundary thai is zero modulo 64. 

The address presented with the data is the IP's exact load address. Before forwarding to 
the X-Bus address, bit 3 must be unconditionally zeroed on a 16-byte fOl. Address bits 5. 
4, and 3 will naturally be zero on a 64-byte fill. This is required by the fill algorithm, 
which is- natural order beginning at the nearest lower byte boundary that is 0 modulo the 
fill size. The address mask bits must be forced to all ones before transferring on the X- 
Bus. 



16-2 X7oia Cache 
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16.1.3 Unencacheable Data Read Miss ^ 

A load ma\ reierence data mat is marked unencaciieante. Load data ma\ n<r aeciared un* 
eacacheabie for any of the lollowinc reason;:: 

• The PMAPE's C bu is iei in me vmua) addres? mapping tables. 

• The fnemon* reference address is a physical one because vinual translauon is not 
enabled. 

• The memory reference address is a physical one required for an MMU table walk. 

• The memory reference address is a physical one caused by a load.physicai tnsiruc- 
tion. 

• The CPU*s instrucuon is a load. lock, requiring access to the bus. 

• The CPU's instrucuon is a load.uniock. requiring access lo the bus. 

The caching decision is made by the MMU and communicated within the M.MU command 
field. All of the remaining data cache miss codes (other than those just menuoned in the 
last secuon) apply to unencacheable references. 

In an unencacheable data cache miss, only the requested data is returned. The address 
presented with the MMU command is forwarded, as is, to the X-Bus. The read mask is 
appropriately constructed to reflect the request size. If the request is for an 6-byte quan- 
tity, a read multiple of 2 loncwords is the result. 

16.1.4 Load.Lock 

The load.lock instruction requires access to the X-Bus to gain the btis lock. For this rea- 
son an unencacheable data miss is declared by the MMU. When the load.iock's dau re- 
turns, the bus lock is secure. 

The MMU may issue a second locking read request before a previously acquired lock is 
released. The MMU may do so while processing a secondary TB miss during a locked 
code sequence. The BIF propetly nests the second request. 

16.1.5 Load-Uniock 

The load.uniock instrucuon requires access to the X^Bus to release the bus lock. For this 
reason, the MMU declares an unencacheable data miss. When the load.unlock's data re- 
turns, the bus lock is released. This instrucuon may be issued even vAien the bus lock is 
not held. This instruction will not release a bus lock not held by this CPU. 
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16.1.6 Data Cache Read Data Return 

Once ihe data cache miss read address is iransierred across ihe bus. ihe BIT awaiis read 
data response. When ihc requested data returns, it is torwarded to the DATA(o5:00) bus 
The data xs then used b\- the JP. FP or MNtU and is optionally stored m the cacne. Tlie 
cache updaung is referred to as filling. 



16.1.6.1 Data Return Delay 

NormaUy. returning read dau is forwarded to the DATA bus in the cycle immediately fol- 
lowing the data transfer on the X-Bus. DATA bus forwardmg is delayed for one additional 
cycle in the following cases: 

• The X-Bus data returns in the same cycle that the EASRC bus is bemg used to 
process an invalidate. A data cache fill cannot take place in the next cycle be- 
cause the £A doesn't hold the proper fill address. 

' • The X-Bus data returns in a cycle immediately after an instruction cache miss that 
requires delayed data forwarding. The immediately abutting X-Bus data returns do 
not allow removal of the instruction cache mi^s delay. The instrucuon cache fill 
may collide when using the PC in the same manner as just described for EA's use 
dunng dau cache fill. 

• The data read request was unencacheable. In this case, the possible need to rotate 
the returning read data requires an additional cycle of delay. 

The dau return delay is not visible to the MMU in handshake protocol. 



16.1.6.2 Data Return Alignment 

If the dau read request is unencacheable, is for one longword or less, and the longword 
address Is even* the returning read dau is duplicated on both halves of the cache dau bus. 
This is required by the MMU which can access only DATA(31:00). In all other cases, the 
returning dau is aligned on the DATA bus as it appears on the X-Bus. 



16*1.6.3 Data Cache Fill Data Sourdng / MEM.RESP 

If the dau cache read miss is for a 16- or 64*byte fill, the requested dau is provided, 8 
bytes at a time, on the X-Bus. The dau is then forwarded, 8 bytes at a time, to the 
DATA bus and written simuluneously with the IP or FP accepting the dau. 
« 

The BIF begins to drive returning X-Bus dau before X-Bus Read Response dau has ar- 
rived. The BIF first drives the bus in the cycle after the dau cache miss MEM^CMD has 
been driven by the MMU. 



16-4 Data Cache 
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Simuiianeousiv wiin ihe DATA bus anvinc. ihe MML' sources ihe MEM_RESP(2:0) field. 
Tvpicallv. code 001 ib driven, Codes^OO and U'l are driven in the event of bur error. 
The data cache fillmc is sincilv slaved to the X-Bus uminc and normaJly uike$ piace m 
uninterrupted cycles. See the ECCU/ECCC subsecuons for the exceptions lo ihis ruie. 

Tabic 1 0-2. MEM JiESP 12:01 Field Codes, Data Cache F'lU 



MEM_R£SP{2:01 - Data Cache Miss 



000 


NO 


001 


Dcache Data Return 


010 




on 




100 


Load ECCU 


101 


Load No ResDonse 


no 




111 





16.1.6.4 Data Cache Fill Parity Sourcing 

The retiirning data panty is regenerated while the data is on the DATA bus. If the request 
ts a 16- or 64'»byie fill, the parity is written into the data cache parity RAMs during the 
following cycle. Byte parity is maintained in the data cache. 



16.1.6.5 Data Cache Fill Address Sourcing / BIF^PAARB BIFJNVOP 

If the data cache read miss is for a 16- or 64-byte fill, the fill index is sourced by the BIF 
on the PA bus* The BIF requests the use of the PA bus one cycle before the address 
transfer (two cycles before the DATA transfer) by asserting the BIF_FAARB(1:0) signals. 
BIF_PAARB s 01 requests the joint use of the PA bus and the EASRC btis in anticipation 
of data cache fill. If there are simultaneous instrucuon and dau cache misses posted. 
BIF^PAARB ss 11 is assened. This requests both the PCSRC and EASRC buses, in case 
either returns on the bus. 

The BIF begins requesting the PA bus before X-Bus Read Response data has arrived. The 
BIF first makes an arbitration request on the PAARB signals in the X-Bus acknowledge 
cycle for the miss read address transfer. 
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•The BIF^PAARB coaes are summarized in Table 16-5. 

Tabic /6-5. BIF^PAARB [J .O! Arnttration Codes. Data Cache Fill 



BIF PAARB11:0| 



00 1 NOP 


01 


Arbitrate for PA/EASRC : cache fill or invalidate 


10 




11 


Arbitrate for PA/EA/PCSKC : cache fill or invalidate 



The BIF sources the 13-bit fill index on PA(15:03) one cycle before the DATA transfer. 
Simuitaneousiy. the BIF requests settmc the dau cache tag's 8 VALID bits in that next 
cycle by deassertmc the BIFJNVOP12:01 signals. BIFJKVOP = 00 implies setung the 
valid bits. Table 16-4 lists a summary of the BIFJNVOL(2:01 bit codes. 

Table 16-4. BIF JNVOP 12:0] Field Codes, Data Cache Fill 



B1FJNVOP12:OJ 



000 0 


NOP 


001 1 


RESET VALID BITS 


010 2 


Selective TB Invalidate 


0X1 3 


Comprehensive T6 Invalidate 


100 4 


Fill 


101 5 


Diagnosdc Fill 


110 6 


undefined 


111 7 


undefined 



16.1.6.6 Data Cache FtU: MMU Tracking 

While the BIF sources both the dau and fill address* the MMU provides the RAM strobes 
and tag concerns* The MMU does so in response to the BIF.PAARB and BIF JNVOP 
signals. The BIF sotarces these signals without knowing about recum data availability. The 
BIF informs the MMU that dau has been wriuen by using the M£M_R£SP(2:0) signals. 

The MMU assumes that the fill is complete by the next cycle when the final fill entry in- 
dex is on the PA bus* and there is no request on the BIF^PAARB signals. If the fiU does 
not complete in this cycle, both the MMU and BIF back up and try again. The MMU rec- 
ognizes this situation by observing that the MEM.RESP field is 000 (NOP) in the cycle* 
whi^ should be last RAM dau write. 



16-6 Data Cache 
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16. i. 7 Data Cache Read Miss Errors 

Mar.v errors are possibie uiiiie processing u data cache read miss. The\ are summanxed in 
mis seciion. 



16.1.7.1 External In%'alidate CoUiston 

In ihe interval between the read address transfer on the X-Bus and the read data return, a 
wnte to the returning dau from another CPU is possible. The BIF watches for this siiu- 
auon and detects any write-read collision on the same physical page, if a collision is de- 
tected, the BIF_IN'\'OP signals are asserted, rather than deassened. in the cycle before the 
data cache wnie. BIF^INVOP = 01 resets the tag's 8 valid bits. 

Table 16^5. BIF JSVOLI 1:01 Codes, External Invalidate Collision 



BIFJNVOFlliOj 



00 


NOP 


01 


Reset Data/insi Tag Valid Bits 


10 




u 





This vkTite-read collision detection applies only to an external write. A locally generated 
uTite is only issued on the X-Bus subsequent to a data cache read* if the wnte was gener- 
ated earlier and does not conflict with the read address. 



16.1.7.2 Bus Acquisition Timeout 

The btis acquisition timer elapsing before the data cache read gains access to the bus. 
indicates a hardware failure. The BIF requests the clocks to stop and records this error 
status in scan sute. The 6 IF contmues to arbitrate for the bus. 



16.1.7.3 No Acknowledge 

A data cache miss address transfer that restdts in no bus acknowledge indicates a software 
failure. Tlie BIF records this error status in the BCTRL register and freezes the ERRADDR 
register. The BIF returns a LOAD_NO_RESPOKSE code, 101, on the MEM_RESP(2;0) 
signals. 
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16.1.7.4 Error Acknowledge 

A daia cache miss address iransier ihai results in an error bus acknowledge, indicates a 
hardware failure. The BIF recoras this error status m scan state. Oiheru-i^e. the BIF acts 
as if it is a busy acknowledge to preserve state. 



16.1.7.5 Read Return Timeout 

The read return timer elapsing before the data cache read data compieteiy returns, indi- 
cates a hardware failure. The BIF requests the clocks "to stop, and records this error in the 
scan state. It conunues to await read return data. 



16.1.7.6 ECCU 

.•\ device error may prevent correct data return. The most common such error is a mam 
memor\' ECCU. This same situauon also occtirs when a secondary bus receives a read 
tmieout. 

When only incorrect data can be returned, a READ RESPONSE ERROR command is re- 
turned on the X-Bus. The BIF. in turn, terminates the transfer. The MMU_RESP(2:0) 
code LOAD ECCL\ 100, is sent to the MMU. 

Once the READ RESPONSE ERROR occurs as one response in a READ MULTIPLE, no 
further response data can be accepted from the X-*Bus. 



16.1.7.7 ECCC 

A correctable data error can occur upon access to main store. If this happens in an unen- 
cacheabie reference, it is not visible to the MMU. If this happens in a 16- or 64-byte HQ, 
it may result in the interpositioning of NOPs within the muming X-Bus read data. When a 
NOP interrupts this sequence, there are always be at least 2 NOPs present. 

When the NOP interrupts the fill sequence, the BIF writes incorrect data to the RAMs. 
The BIF then backs up the fill address by eight bytes, awaits the corrected dau, and re- 
writes the RAM location. 

When the NOP arrives instead of the last 8 bytes of read return data, there is an addi* 
tionai complication: the BIF may have relinquished control of the PA bus. The MMU rec- 
ognizes this situation and holds the processor stall. The BIF rearbttrates for the PA and 
EASRC btises. sources the last fill address, and waits for corrected data. The BIF needs 
teo NOPs to arbitrate and then resupply the former fill address. 

If a data returning X-Bus sequence is interrupted by NOPs, the responder asserts ARB 
INHIBIT to prevent another device from gaining access to the bus. The BIF does not have 
to be prepar d to handle external invalidates or izmruction cache read data responses dur* 
ing such an imerrupdon. 



16-8 Data Cache 
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16.2 Data Cache Invalidates 

Daia cache invalidates ma\ t^t posiea irom me BIF lo me aa;a cacne 

16.2.1 Data Cache Im'alidate Address Sourcing / BIF^PAARB BIF_IN\'OP 

The BIF pro\ides only the invalidate index for the cache location to be purged. The ad- 
dress IS transferred over the PA bus. The BIF rec»;ests the use of the bus one cycie before 
the address transfer (two cycler nefore the tag in\:»i:daic) by asserting the 
BIF_PAARB(1:0) signals. BIF^r AARB « uj requt ..:s the joint use of the P'\ bus and the 
EASRC bus. BIF^PAARB = 1 1 requests the joint wse of the PA bus. EASRC bus, and 
PCSRC bus. The BIF uses this code to invalidate both caches. 

Tabic 16^6. B!F^PAARB(i:0} Field Codes. Data Cache Invalidate Address Sourans 



BIF_PAARB(1:01 



00 


NOP 


01 


Arbitrate for P.A/EASRC : cache fill or invalidate 


10 


Arbitrate for PA/PCSRC : cache fill or invalidate 


11 


Arbitrate for PA/EA/PCSRC : cache fill or invalidate 



The 13-bii invalidate index is on PA(15:03) one cycle before the tag RAM write. Simuita- 
neousiy^ the BIF requests clearing the data cache tag's 8 VALID bits in that next cycie by 
asserting the B1FJNVOP(1:OJ signals. BIF_INVOP s 01 resets the tag s 8 valid bits. 

Table 16^7. BIFJnVOF12:0] Field Codes. Data Cache Invalidate Address Sourdng 



BIFJNVOPI2:01 



000 


0 


NOP 


001 


1 


RESET VALID BITS 


010 


2 


Selective TB Invalidate 


on 


3 


Comprehensive TB Invalidate 


100 


4 


FiU 


101 


5 


Diagnostic Fill 


110 


6 


undefined 


111 


7 


undefined 
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16.3 Data Cache Writes 

The BIF HTiies processor store daia lo me daia cacne and forwaras ii lo me X-Bus. This 
wnie-ihroucn-cache siratecv requires me BIF lo nandle processor unies efiecuveiy. 

Unlike reads, ihe CPU does noi wan for a wnie request completion. The BIF simply 
queues the write data and address. This decouples the CPU from X-Bus acquisition la* 
lencY. 

16.3.1 MMU Request to the BIF 

The MMU provides the write's 30-bit physical address on the PA bus. The MMU com- 
mand accompanies ihe physical address. 

The uTite's virtual page offset within segment (VPN) bits, are presented before the physical 
address and command. Typically, the BIF captures the 7 bits from the external £A register 
during every cycle. If a write occurs, the physical address and command arrive during the 
following cycle. If. however, the PA bus is not available in this succeeding cycle, the MMU 
asserts the signal MMU_HOLD_DVPN. The BIF holds the captured data cache VPN. 
MMU_HOLD_DVPN is dcasserted during the cycle in which the physical address and 
command are finally sent to the BIF. 

Properly aligned uTite data is also presented before the physical address and command. 
Typically, the 64 bits are captured by the BIF directly from the DATA bus during every 
cycle. Again, the physical address and command arrive in the following cycle. If. however, 
the PA bus is not available in this succeeding cycle, or a write buffer full stall is in effect, 
the MMU deassens the signal MMU_HDATA_LD. The BIF holds the captured data. 
MMU_HDATA_LD are reasserted during the cycle in which the physical address and 
command are Hnaliy sent to the BIF. 
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There are many commanas mai appiy to data cacne urue. They are 5ummanzed in ihe 
Table 16-^ The snaaed area* oi Uie table go noi-appi\ . ~ 

Tabic /6-6, ,\f£,\JjC,S}DH i'! Coacs. Da:a Cache Writer 



MEM^CMD[4:0] 



00000 


NOP 1 


10000 1 


store. nolock.cache. 1 


00001 


load, nolock. cache. 16 | 


10001 1 


siore.nolock.cache.2 


00010 


feich.noiock.cacheo2 | 


10010 


store. nolock.cache. 4 


000 n 


load.riolock.cache.64 | 


10011 


store.nolock.cache.8 


00100 


load.nolock.nocache.l | 


10100 1 


storc.noiock.nocache. 1 


ootoi 


]oad.iioiack.nocache .2 


10101 j 


store .nolock.nocache . 2 


00 no 


loaci.notock .nocache. 4 


lOllO 


store . nolock.nocache . 4 


OOUl 


ioad.noiock.nocache.8 


lOlU 


store.nolock.nocache.8 


01000 


load.lock.nocache. 1 


11000 


TB invalidate single 


01001 


load.lock,nocache.2 


11001 


TB invalidate all 


OIOIO 


load.lock.nocache.4 


11010 


minu_store.unlock.nocachc.4 


01011 


load.lock.nocache.S 


11011 


unassigned 


01 100 


load .uniock.nocache. 1 


11100 


store.uniock.nocache. 1 


OllOl 


load.unlock.nocache.2 


UlOl 


store.unk>ck,nocache.2 


OHIO 


Ioad.imlock.nocache.4 


lUlO 


store.uniock.nocache . 4 


01111 


load.uniock .nocache.8 


mil 


5tore.unlock.nocache. 8 



16.3.2 Cacheabie Data Store 

In the typical data cache store, the M£M_CMD(4:0) field ranges from 10000 to 10011, 
STORE .NOLOCK.CACH£.byte_couni. The commands just indicate the store's request 
size. The address presented with the command is the IP's exact store address. Cacheabie 
store data may be combined with previously issued cacheabie store data to compose larger 
X*Bu$ transactions. 
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16.3.3 Unencacheable Data Store 

A siore mav also be aeciarea unencacheaoie lor one o: ihe follounnc reasons. 

• The PNtAPE's C bii is set in ihe viaual aaaress mapping tables. 

• Tlie memon- reference address is a physical one. Virtual iranslauon isn't enabled. 

• The memor>- reference address is a physical one required for an MMU table walk. 

• The CPU's instrucuon is a store. unlock, requinnc access lo the bus. 

The MMU makes the caching decision and communicates it in the MMU command field. 
All of the remaining data store command codes (other than those previously menuoned) 
apply to unencacheable references. Write compacuon is not permitted during an unen- 
cacheable data cache store. The MMU foru'ards the address presented with the MMU 
command, as is. lo the X-Bus. The write mask is appropriately constructed to reflect the 
exact request size. If the request is for an 8-b\ie quanuty, a write muluple of 2 longwords 
results. 

16.3.4 STORE.UNLOCK 

The STORE.UN'LOCK instrucuon is handled no differently than any other unencacheable 
store except that the bus lock may be released as a side-effect of the X-Bus request com- 
pletion. The IP assumes the bus lock is released as soon as the write is queued. 

The M.MU may issue a second locking read request before a previotisly acquired lock is 
released. The MMU may do so while processing a secondary TB miss during a locked 
code sequence. The BIF properly nests this second request and requires two store.unlocks 
before releasing the bus. 

MMU. STORE.UNLOCK differs from other store.unlocks in that the write data will always 
be provided in the least significant 32 bits. The longword store address* when it is even, 
requires a special write rotation before the data may be presented to the X-Bus. This in- 
struction may be issued even when the bus lock is not held. This instruction does not re- 
lease a bus lock not held by this CPU. 

16.3-5 Write Buffer FuU 

When the BIF can't accept much more store data, ii asserts the signal WBUF^FULL to 
the MMU to generate back pressure. The MMU interprets this signal to mean that if there 
is currently a store in its data cache access phase, that store data can be accepted but the 
address can not. This means that the store must stall in its excepuon phase. 
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16.3.6 Data Cache Write Errors 

Btcau5e X-Bu? vvTiies are one wax irartsi'er>. device crrorv suci; a< nu>uiiar\ nus umeoul^. 
ECCC*5 ana ECCU's musi he aetecied anc recnracc ai me wnie j desunauon. The leu 
errors thai are possible m the course of proces^inc a data cache urue are summanzed in 
ihis section. 



16.3.6.1 Bus Acquisition Timeout 

The bus acquisition timer eiapsmc before the data cache v^nte gains access to the bus. mdi* 
cates a hardware failure. The BIF requests the clocks to stop and records this error m scan 
siate. The filF conunues to request the bus. 

16.3.6.Z No Acknowledge 

The data cache wnte address transfer resulung in no bus acknowledge, indicates a software 
failure. The BIF records this error status in the BCTRL register and freezes the ERRADOR 
register. The wnte request is ignored. 

16.3.6.3 Error Acknowledge 

The data cache wnte address transfer resulting in an error bus acknowledge, indicates a 
hardware failure. The BIF records the error status in scan state, but otherwise treats the 
the acknowledge as a busy one to preserve state. 
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16.4 TB Invalidates 

Transiauon Buffer invalidates mav be boih posted by the MMU for fonvardmc to the X- 
Bus. or relayed from me X-Bus. by the BIF, lo the MML*. 

16.4.1 Im^lidates from the MMU 

Similar to data cache wntes. the CPU does noi wan for a TB invalidaie completion. The 
MMU relays and ihe BIF queues the TB invalidate request. There are both selective and 
comprehensive TB invalidates. There is one MMU_CMD(4:Q) code for each. Code IIQOO 
indicates a selective TB invalidate. A 20-bit vinual address is expected to accompany it. 
The .MMU provides the xirtual address on PA(01:00) || PA(29:12). The address is relayed 
to the X-Bus where it appears in the address bit positions 31 through 12. Code 11001 
Identifies a comprehensive TB invalidate. No address is required in this case. So VPN is 
associated with a TB invalidate. No data is associated with a TB invalidate. 

Table 16-9. MEM^CMDf4:0J Codes. TB Invalidates 



MEM_CMD(4:01 



00000 1 


NOP 


10000 


store.nolock.cache. 1 


00001 


load.noiock.cache. 16 


10001 


store.nolock.cache.2 


00010 


f etch.noiock. cache . 32 


10010 


store.nolock.cache.4 


00011 


load . nolock .cache . 6 4 


10011 


store.nolock.cache. 8 


00100 


load.nolock.nocache. 1 


10100 


store.nolock.nocache. 1 


00101 


load.nolock.nocache.2 


10101 


store .Roiock.nocache.2 


00110 


load.nolock.nocache. 4 


10110 


siorejtioiock.nocadie.4 


00111 


load.noiock Jiocache. 8 


10111 


store.nolock.nocache . 8 


01000 


load.lod&.nocache» 1 


11000 


TB invalidate single 


01001 


loadJockAocaehe.2 


11001 


TB invaiidate all 


01010 


Ioad.lock.nocaehe.4 


11010 


mmu score.ttnloek.nocache.4 


01011 


load.lock.nocache»8 


11011 


unassigned 


01100 


load.unlock Jiocache. 1 


11100 


5tare.unlock.nocache.l 


01101 


load.tinlock.nocache.2 


11101 


store.uniock. nocache • 2 


OHIO 


load.tmlock.nocache. 4 


inio 


store.unlock.nocacfae*4 


01111 


]08d.unlock Jiocache. 8 


mil 


store.unloek.nocaehe.8 
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- 16.4.2 Invalidates from the MMU: Write Buffer Full 

« 

TB mvaiidaies. boih selecuve and comprenensive* occupv a position -in ihe wrue queue. 
Consequemly. mey can resuii in wnie buffer mil suils. U ihe BIF li u::able to accept an- 
other TB invahdaie or more siore data, ihe BIF assens ihe VK'Bt'F^FULL signal. 

16-4.3 Invalidates from the MMU: Bus Errors 

Only two errors are possible m transmuting a TB invalidate on the X-Bus: failure lo secure 
the bus, and a pamy error upon uansmisston. 

16.4.3.1 Bus Acquisition Timeout 

The bus acquisition tuner elapsing before the TB invalidate gains access to the bus. indi- 
cates a hardware failure. The BIF requests the ciocics to stop and records this as a ^mle 
error in the scan state. The BIF conunues to request the bus. 

16.4.3.Z Error Acluiowledge 

The TB invalidate transfer resulting in an error bus acknowledge, indicates a hardware fail- 
ure. The BIF records this as a write error in the scan state. The BIF otherwise treats this 
acknowledge as a busy one to preserve state. 

16.4.4 InvaUdates to the MMU 

The BIF forwards incoming TB invalidates to the MMU. The for^&'arding follows the cache 
invalidate pipeline. Both selective and comprehensive TB invalidates may be posted to the 
MMU. The BIF sources a 20-bit virtual page number on the PA bus u^en a selective TB 
invalidate is required. If a con^srehensive invalidate is desired, na address -is requvred>.The 
B7F arbtaites fbr, and secures,* the PA bus. 

16.4.4.1 External Selective TB Invalidate Address Format 

Incoming TB invalidate addresses are right shifted before being sent across the PA bus. 
The VPN bits 31 through 12 are aUgned on the PA bus in bit positions 22 through 3. 
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16.4.4.2 External TB Invalidate Address Sourcing / BIF^PAARB BIF_INVOP 

The BIF uses ihe BIF_PAARB sicnais lo requesi ihe PA bus lo iransier the invahdaie .aa- 
cress. The BIF usually request the use of PA and EASRC buses. BIF^PaaRB = ai. It an 
mstrucuon cache fill is under%*'ay at me same unrie. BIF^PAARB = i i is ariven. Tne deci-* 
Sion to do an insirucuon cacne fill or TB invalidate i5 men deferred one cycle. 

Table J6''I0. BIF^FAARB/LO! Codes. Externa/ TB Invalidate Address Sourang 



BIF_PAARB(1:0| 



00 


NOP 


01 


Arbitrate for PA/EASRC : cache fill or mvalidaie 


10 


Arbiiraie for PA/PCSRC : cache fill or invalidate 


11 


Arbitrate for PA/EA/PCSKC : cache fill or invalidate 



Either a selective TB invalidate or a comprehensive TB invalidate is requested in the same 
cycle as the PA bus use. If selecuve. the TB im^alidaie index is on PA bus. The BIF re- 
quests the selective TB invalidate by setting BIF_lNVOP « 10. U a comprehensive TB in- 
vaUdaie is desired, the BIF sets BIFJNVOP s U. 

Table 16-^11. BIFJNVOP[2:0] Codes, External TB Invalidate Address Sourang 



BIFJNVOP (2;Q1 



000 0 


NOP 


001 I 


RESET VAUD BTTS 


010 2 


Selective TB Invalidate 


aix 3 


Comprehensive TB Im'alidate 


iOO 4 


Fill 


101 5 


Diagnostic Fill 


110 6 


undefined 


111 7 


undefined 



8B 
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Chapter 17 

Instruction Cache 



17.1 Instruction Cache Read Miss 

Processor instruciion fetches are usually satisfied by the instruction cache. An insirucuon 
cache read miss occurs when the data cache does not presently contain the requested in- 
sirucuon. 

In the main* msiruction cache read miss processing parallels that of data cache read mtss. 
The major differences result because there are fewer requests within instrucuon cache miss< 



17.1.1 MMU Request to the BIF 

The MMU provides the (etch's 30-bit physical address on the PA bus. The MMU com- 
mand accompanies the physical address. 

The read's virtual page offset within segment (VPN) bits are presented before the physical 
address and command. Typically, the BIF captures the 7 bits from the external PC register 
during every cycle. If an instrucuon cache miss occurs* the earliest the physical address 
and command can arrive is the following cycle. If. however, the PA bus is not used or is 
otherwise unavailable in this succeeding cycle, the MMU assen the MMU^HOLDJVPN 
signal. The BIF holds the captured instruction cache VPN. MMU.HOLDJVPN is deas- 
sened during the cycle in which the physical address and command are finally sent to the 
BIF. 

There is only one command that applies to instruction cache miss. 



EP 0 366 434 A2 



Apollo Preliminary and Confidential 



Table 7."-/ A '£A/_CA Codes, instruction Cache M:ss 



MEM_CMD|4:0) 



00000 


NOP 1 


10000 i 


siore. nolock. cache. 1 


00001 


load.nolock. cache. 16 


10001 


siore.nolock.cache . 2 


00010 


feich.nolock.cache.32 


10010 


store.nolock.eache. 4 


00011 


load.nolock. cache. 64 


loon 


storc.nolock.cache.fi 


00100 


load.nolock.nocache. 1 


10100 


store.noiock.nocache. 1 


00101 


load.noiock.nocache.2 


10101 


store^olock.nocache.2 


00110 


load.nolock.nocache . 4 


10110 


store.nQlock.nocache . 4 


OOUl 


load . nolock . nocache . E 


10111 


store . noiock .noca che . 8 


OlOOO 


load.lock..nocache. 1 


uooo 


TB invalidate single 


01001 


load.lock.nocache.2 


11001 


TB invalidate all 


01010 


loadJock.nocache.4 


11010 


mmu_siore.unlock.nocache.4 


01011 


load.lock.nocache.S 


11011 


unassigned 


01100 


toad.i2nlock. nocache. 1 


11100 


store.uniodcnocache. 1 


01101 


load.unlock.nocache.2 


UlOl 


store.unlock.nocache . 2 


OHIO 


Ioad.uniock.nocache.4 


lino 


store.tiniock.nocache . 4 


Oliii 


ioad.unlock.nocache. 8 


mil 


store.uniock.nocache . 8 



All instruction cache misses are cacheabie and 32 bytes long. 

The address presented with the command is the IP's exaa fetch address. Before forward- 
ing to the X-Bus, address bits 3 and 4 num be unconditionally zeroed. This is required by 
the nil algorithm, which is natural order beginning at the nearest lower byte boundary that 
is 0 modulo 32. The address mask bits must be forced to all ones before transferring on 
the X-*Bus. 

17.1.2 Instruction Cache Read Data Return 

Once Che instruction cache miss read address is transferred across the X*Bus« the BIF 
awaits read data response. When the requested dau finally returns, it is forwarded to the 
INST(63:0O) bus. The instnicuon is then stored in the cache. 
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17.1.2. 1 Instruction Return Delay 

NormalU. reiurninc mcmorv caia t$ toruaraed lo me INST bus aunnc tr.e cvcic tmmedi- 
aieh K)How;n^ ihe aata iraasier on ihe X-Bus. In some cases, however. INST bus torwarci- 
:ng I? delaied one additional cvcie. The follounng ca.se^ summarize ihi5. 

• The X-Bus data returns during the same cycle that the PCSRC bus is bemc used 
to process an mvalidaie. An instruction cache fill cannot take place m the next 
cycle because the PC viill not hold the proper fill address. 

• The X-Bus data returns in a cycle immediately after a dau cache miss that re- 
quired an inseruon delay. The immediately abutting dau and instruction fill dau 
responses on the X-Bus don't allow for removing the dau cache miss's delay. 

The dau return delay is not visible to the MMU in handshake protocol. 

17.1.2.2 Instruction Return Alignment 

The instruction dau is always aligned on the INST bus as it appears on the X-Bus. See 
the ECCU/ECCC secuon for the exceptions. 

17.1.2.3 Instruction Cache Fill Data Sourcing / MEM^RESP 

The instruction cache dau is provided, 8 bytes at a time, on the X-Bus. and is fon^-arded 
to the IXST bus. The instruction cache filling is strictly slaved to the X-Bus tinung and 
normally ukes place in uninterrupted cycles. The BIF begins driving returning X-Bus data 
before X-Bus Read Response dau has arrived. The BIF first drives the INST bus during 
the cycle after the mstmction cache miss MEM^CMD has been driven by the MMU. 

The MEM_RESP{2:0) field is sourccd by the .MMU at the same time as the BIF drives 
die INST bus. Typically, code 010 Is driven. Codes 110 and 111 are driven in the event of 
a bus error. The instruction cache filling is strictly slaved to the X^Bus timing and normally 
ukes place in uniniemipted cycles. See the ECCU/ECCC section for the exceptions. 
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Tabic /"-2. MEM^RESF.'Z.OI Codes, Instruction Cacne Fill Data Sourcmz 



MEM_RE5Pf2:0I - Data Cache Miss 



ooo 


NOP 


001 


Dcache Oau Return 


010 


Icache Data Return 


oil 


undefined 


100 


Load-ECCU 


101 


Load No Response 


110 


I Fetch ECCU 


in 


1 Fecich No Response 



17.1.2.4 Instruction Cache Fill Parity Sourcing 

The returning instruction parity is regenerated while the data Is on the INST bus. It is writ* 
ten into the instruction cache parity RAMs during the following cycle. One bit of parity is 
maintained over ail even instruction bytes, and one over all odd instruction bnes. 



17.1.2.5 Instruction Cache Fill Address Sourcing / BIF^PAARB BIF^INVOP 

The BIF sources the instruction cache fill index on the PA bus. The BIF requests the PA 
bus one cycle before the address transfer (two cycles before the INST transfer) by asserung 
the BIF_PAARB(1:0) signals. BIF^PAARB s 10 requests the joint use of the PA bus and 
the PCSRC bus. BIF.PAARB » 1 1 requests the use of the EASRC bus. BIF^PAARB s 1 1 
is only used if instruction cache miss and data cache miss are concurrently underway on 
the X-Bus. 

The BIF begins requesting the PA bus before X-Bus Read Response data has arrived. The 
BIF first makes an arbitration request on the PAARB signals during the X-Bus acknowl- 
edge cycle for the instruction miss read address transfer. 

Tabie 17^3. BIFJ»AARB(1:0} Codes ^ Instruction Cache FiU Address Sourcing 



BIF_PAARBI1:0] 



00 


NOP 


01 




10 


Arbitrate for PA/PCSRC : cache fill or invalidate 


11 


Arbitrate for PA/EA/PCSRC : cache fill or invalidate 
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The tilF 5ource5 ihe 14-b»t fill inaex on PA(29:16^ one cvck before me INST iransier. 
Simulianeousiv. ihe BIF aiiempi^ ip sei the insirucuon cacne uc's \*AL1D bii aunnc thni 
next c\cie b\ deassenmg the BIF^INVOP signals. 

T^bic 3IFJSTOP[I:0! Cod€S. Jnstrucunn Cacnc FtU Address S^'urcmc 



BIFJNVOPil:01 



00 


NOP 


01 




10 




11 





17.1.1.6 Instruction Cache FUl; .MMU Tracking 

WHiile the BIF sources both the dau and fill address, ihe MMU provides both the RAM 
strobes and tag contents. The MMU does so in response to the BIF.PAARB and BIF^LV- 
VOP signals. The BIF sources these signals without knowing about return data a\'ailabiUty. 
The BIF informs the MMU that data has been wrinen after the fact, via the 
MEM_RESPC2:0) signals. 

The MMU assumes that the fill completes during the next cycle when the final ftU entry 
Index is on the PA bus and there is no request on the BIF_PAARB signals. If. for some 
reason, the fill does not complete during this cycle, both the MMU and BIF backup and 
try again. The MMU recognizes this sicuauon because the MEM^RESP field is 000 (NOP) 
during the cycle which should have been the last RAM data wnte. 

17.L3 Instruction StFeam-Writes 

The hardware makes no attempt to interlock stores with instruction stream reads. If a pro- 
gram wishes to update the instruction stream it must follow the following sequence: 

• Execute the store. 

• Execute a load.unlock. This assures that the store has completed on the X*Bus. 

• Wait for the invalidate pipeline to empty (5 instructions). 

• Fetch the instrucoon.. 

17.1.4 Instruction Cache Read Miss Errors 

The errors that are possible in the course of processing an tnstiuction cache read miss are 
summarized in this section. 
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IT. 1.4.1 External Invalidate Collision 

!n the imerval between ihe read address iransier on ihe X-Bus and the read data reiurn. a 
write lo the reiurninc data from another CPU can occur. The BIF watches for tms suu- 
auon and detects any vtTue-reaa collisions on the same physical page. If a collision is de- 
leciec. the BIF_INVOP( 1 :0| signals are assened. rather than aeassened. dunnc the cycie 
before the instruction cache uTue. BIF J WOP = 01 resets the tag's valid bit. This poicn- 
ual cache invalidauon also applies to locally generated untes. 

Table J 7-5. BIFJNVOPfliQf Codes, External Invalidate Collision 



BIF INVOPll:0l 



00 


NOP 


01 


Invalidate Instrucuon/Oata Cache 


10 


- 


11 





17.1.4.2 Bus Acquisition Timeout 

The bus acquisition timer elapsing before the instruction cache read gains access to the 
bus, indicates a hardware failure. The BIF requests the clocks to stop and records this er- 
ror status in the scan state. The BIF continues to arbitrate for the bus. 



17.1.4.3 No Acknowledge 

The instruction cache miss address transfer resulting in no bus acknowledge, indicates a 
software failure. The BIF records this error status in the BCTRL register and freezes the 
ERRADDR register. The BIF returns a FETCH_NO_RESPONSE code. Ill, on the 
MEM_RESP(2:0) signals. 

Any instnxction fetch from a memory region that cannot support an X-Bus READ MULTI- 
PLE results in this error. An atten4)t to fetch from UTILITY board RAM results in this 
error. 

17.1.4.4 Error Acknowledge 

The instruction cache miss address transfer resulting in an error bus acknowledge, indicates 
a hardware failure. The BIF records this error status in the scan state. The BIF otherwise 
treats this acknowledge as a busy one to preserve sute. The source of the acknowledge 
requests a clock freeze. 
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17.1.4.5 Read Reiurn Timeout 

The read return timer -eiapsing before the insirucuon cache read data compiete-ty-xeiurns. 
indicates a hard\^are lailure. The BIF recoras this error status in the scan state. The BIF 
continues to a wan read data return. 



17.1.4.6 ECCU 

A device error may prevent correct data return. The most common such error is a mam 
memor}' ECCU. 

When only incorrect X-Bus data can be returned, a READ RESPONSE ERROR command 
is returned on the X*Bus. The BIF terminates the transfer and sends the 
MMU_R£SP(2:0) code FETCH ECCU (110) to the MMU. No further response data for 
the READ MULTIPLE are accepted from the X-Bus. 



17.1.4.7 ECCC 

A correctable data error can ocoir upon access to main store, if this happens during an 
insirucuon cache fill, this may resuh m the interposiiioning of NOPs within the returning 
X-Bus read data. V^Hien a NOP interrupts this sequence, there are always at least 2 NOPs 
present. 

V/hm the NOP interrupts the fill sequence, incorrect data is written to the RAMs. The 
BIF then backs up the fill address by eight bytes, awaits the corrected data, and rewntes 
the RAM locauon. 

When the NOP arrives, instead of the last 8 bytes of read response dau, there is an addi- 
tional compiicauon: the BIF may have relinquished control of the PA bus. The MMU rec- 
ognizes this situation and holds the processor stall. The BIF rearbitrates for the PA and 
PCSRC buses. It then sources the last fill address and waits for correaed data. Two NOPs 
are required to arbitrate and resuppiy the former fill address. 

If a data returning X-Bus sequence is interrupted by .VOPs. the responder asserts ARB 
INHIBIT to prevent another party from gaining access to the bus. The BIF does not have 
to be prepared to handle external invalidates or data read data response dtiring such an 
intemxption. 
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-17.2 Instruction Cache Invalidates 

Insirucuon cache invaiidaies mav be posiea from tne BIF lo ihe insirucuon cacne. 

17.2.1 Instruction Cache Invalidate Address Sourcing / BIF^PAARB BIF_IN^'OP 

The BIF proxndes only the invalidate index for the cache location to be purged. The BIF 
requests use of the bus one cycle before ihe address transfer (two cycles before the tag 
invaadate) by asserung the BIF^PAARBf 1:0) signals. BIF^PAARB = 10 requests the joint 
use of the PA bus and the PC5RC bus. BIF_PAARB » 1 1 requests the joint use of the PA 
bus, EASRC bus and PCSRC bus. This code is used if both caches are to be invaitdated. 

Tabic 77-6. BIF_PAARBf!:0], Instruction Cache Invalidate Address Sourcing 



BIF_PAARBll:OJ 



00 


NOP 


01 




10 


Arbitrate for PA/PCSRC : cache fill or invalidate 


11 


Arbitrate for PA/EA/PCSRC : cache ffll or invalidate 



The i4-bit invalidate index is on PA(29:16) one cycle before the tag RAM write. Simulta- 
neously, the BIF clears the instniction cache tag*s VALID bit during that next c>*cle by 
asserang the BIF^INVOP signals. BIF_INVOP = 01 resets the tag's valid bit. 

Table 77-7, BIFJtNVOP[I:0J, Instruction Cache Invalidate Address Sourcing 



BIF_INVOP[1:01 



00 


NOP 


01 


Invalidate lnsiruction>Data Cache 


10 




11 





as 
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Chapter 18 

Cache Parity 



18.1 Instruction Cache Data Parity 

The BIF CBD iCs maintain and check paniy on ihe 64 bits of the insiruciion cache data 
RAMs. There is one parity bit covering each 32 bits. INST_PARITYfO) holds parity over 
ail even b>ies of the INST bus. INST^PARIT^'(l) holds parity over ail odd bytes of the 
INST bus. The odd/even division maintains one bit per CBD gate array. 

Odd panty is mainuined (the sum of all ones in the 32 bits of data plus the parity bit 
should be odd). 

INST_PARITY(1:0) are bidirectional bits. There is one 16K x 4 RAM devoted to holding 
the panty. The panty RAM is always accessed during the cycle after the instrucuon cache's 
data RAMs are accessed. The address is piped forward unconditionally in external regis* 
lers. The instruction parity is always good. 

18. 1. 1 Instruction Parity Checking 

Parity on the INST bus is always checked, unless the CBD gate array is driving it. The 
CBD gate arrays drive it only during instruction cache msss. 

Parity is checked during the instruction parity RAM access cycle. Detecung a parity error, 
indicates a hardware fault. The CBD gate array signals the SCR to halt the system docks, 
and freezes error status in the embedded scan state. 
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IS. 1.2 Insirucuon Parity Generation 

When insirucuon cache fill is underw'av. insinjciion pariiv is computed from the X-Bus 
parity. The 8 X-Bus panty bits are reduced to 2. These 2 parity bits are loaded imo an 
outbound insirucuon pamv register for sourcinc omo INST_PARIT^*(1:0^ dunnc me cycle ' 
after the instruction data, if the instruction cache's data RAMs are being wnnen. the par- 
ity RAM is written unconditionally dunnc the following cycle. Embedded scale may force 
the INST_PAR1TY(1:0) bits to always be 1, or always be 0. 

Diagnostic RAM update mimics an extended insirucuon cache fiU. Parity u-picaily is pan of 
the diagnosuc pattern generauon. 



18.2 Data Cache Data Parity 

The BIF CBD iCs mainuin and check panty on the 64 bits of the data cache data RAMs. 
There is one panty bit covenng each 8 biu. This is necessitated because the bytes must be 
updated individually, DATA_IPARITY(0) provides parity over DATA(63:58). 
DATA_IPARrry(7) holds parity over DATA(07:00). Each CBD gate anay is responsible 
for 4 parity bits. 

-Odd parity is maintained (the sum of all ones in the 8 bits of data plus the parity bit 
should be odd). 

There are 8 16K x 1 RAMs used to hold the parity status. The RAMs have separate data 
in and data out pins. There are separate OATA_IPARITY(7:0) and 
DATA_OPARITy(7:&) signals. The panty RAMs are always accessed during the cycle fol- 
lowing the data cache's data RAM access. The address is piped forward unconditionally in 
external registers. The data parity is always good. 

18.2.1 Parity Checking 

The parity is checked on the DATA bus when the signal CHECK^DATA is assened. This 
signal is externally derived from the RAM controls of the data cache. This signal is as- 
sened to the CBD ICs during the cycle after the data RAMs are read. The RAMs are read 
most of the time, except during processor stores and data cache tiUing. 

The parity is checked using DATA_IPARITY(7:0) during the cycle in which the data par* 
ity RAMs are accessed. Detecting a parity error indicates a hardware fault. The CBA gate 
array signals the SCR to halt the system clocks and freezes error status in the embedded 
scan state. 
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15.2.2 Parity Generation 

Pamv IS always provided bv ihe CBD. When a aaia cacne fill unaeruav, caia parii\ i*: 
passed direcUv from ihe X-Bus pariiv. These fc pamv mis are loaaed inio an ouioouna 
insiruciion pamv register for sourcinc onio DATA_OPARITYf7.0i aurinc the cvcie after 
ihe daia is sent. Pamv is also aiways being compuied on ihe DATA bus direcUy. \\*hen a 
cache daia fill is noi in progress, ihis pamv is sourced onio the DATA_OPARITS'(7:0). If 
the data cache's dau RAMs are being uTitien. the pamv R.AMS are uriiien uncondiiionaUv 
dunng the next cycle. 

Embedded siaie may force ihe DATA_OPAR1T\*(7:0} bus lo always be 1, or always be 0. 

Diagnostic R.^M update emulates an extended data cache fill. Parity is typically pan of the 
diagnostic pattern generauon. 

18.2.3 Secondary' TB Data Parity 

The CBD ICs are unaware of whether a secondary TB look-up. or a data cache read is 
underway in the data cache. 



18.3 Instruction Cache Duplicate Tag Store Parity 

The CBA IC maintains and checks panty on the 18 bits of the RAMs in the DITS. There 
is one parity bit (DrrS.PARrTY} covering all 18 bits. Odd parity is maintained (the sum 
of all ones in the 18 bits of data plus the parity bit is odd). 

DrrS^PARTTY is bidirecuonal and accessed during the same cycle as the tag contents. The 
orrs pamy is always good. 

18.3.1 Parity Checking 

The pahiy is always checked on the DITS.DATA(29:12], unless the CBA gate array is 
sourdng it. The CBA gate arrays does so together with the READ RESPONSE phases of 
an instruction cache fill's READ MULTIPLE, or during a OfTS entry invalidation cancelia- 
tion. 

The parity is checked during the cycle following the RAM access. Detecting a parity error 
indicates a hardware fault. The CBD gate array signals the SCR to halt the system clocks 
and freezes error status in the embedded scan state. 
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18.3 .2 Parity Generation 

The DITS is updated durinc the two cvcie? following me READ RESPONSE to an insiruc- 
uon cache miss's READ MULTIPLE. The DITS ij- aiso upaaied dunnc RAM diagnostic 
operation and during enirx- invalidation In all cases, paniy i5 cenerated during me cycle 
before ihe RAM wnie. 

Embedded state may force the DITS^P.ARIT^' to always be 1. or always be 0. 



18.4 Data Cache Duplicate Tag Store Parity 

The CBA IC mamiains and checks panty on the IS bits of the DOTS RAMs. There is one 
parity bit (DOTS_PARIT\') covering all 18 bits. 

Odd parity is mamiamed (the sum of all ones m the 18 bits of data, plus the panty bit. is 
odd). 

DOTS^PARITY is bidirecuonal and is accessed durxngthe same cycle as the tag contents. 
The DOTS panty is always good. 

18.4.1 Parity Checking 

The parity is always checked on the DOTS_DATA(29:12), unless the CBA gate array is 
sourcing it. The CBA gate arrays does so together with the READ RESPONSE phases of a 
data cache fill's READ MULTIPLE, during DOTS entry invalidation cancellation, or after 
a cacheable local store. 

The parity is checked during the cycle following the RAM access. Detecting a parity error 
indicates a hardware fault. The CBA gate array signals the SCR to hall the system clocks 
and freezes error status in the einbedded scan state. 

18.4.2 Parity Generation 

The DOTS is updatd during the two cycles following the READ RESPONSE to an cache- 
able data cache miss's READ MULTIPLE. The DOTS is also updated during RAM diag- 
nostic operation and during entry invalidation. Finally, the DOTS is updated during the two 
cycles after a locally generated cacheable write is transferred on the bus. In all cases, par* 
ity is generated during the cycle before a RAM write. 

Embedded state may force the DOTS^PARTTY to always be 1. or always be 0. 



BB 
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Chapter 19 

Floating-Point Unit (FPU) 

19.11 0\^RVIEW OF FLOATING POINT UNIT 

The AT Floating Point Unit is a five chip co-processor which provides the registers, data 
paths and control necessary to implement the instructions described in the "Specification 
of the AT Floating Point Architecture*' as well as pan of some instructions described in 
the "AT Instruction Set" (LOAD, STORE, MxPR, BR). The floating point unit operates 
synchronous to the Integer Processor (IP) and relies upon the IP for most control in dis- 
patching instructions and loading and storing of data. The floating point unit hardware 
does not have the capabilit>' to implement the full architecture requirements by itself, so 
an "Interlude Handier" will supplement the hardware and export to the user all functional* 
it}- required by the architecture. 

This specification will concentrate on describing the Floating Point Control chip, and 
Floating Point Register File chip as a prelude to their design. 

1.1 FLOATING POINT UNIT COMPONENTS 
The FPU consists of five components as follows: 

1. The Roadng Point Control chip (a.k.a. FPC), a 257-pin 13K gate array (ICS 

10130). 

2. The Floating Point Register File chip (upper) (a.k.a. FRF), This handles the 

msw of a iongword. It is a 2S7-pin 22K gate array with GPR (ICS 10220G). 

3. The Floating Point Register File chip (lower) (a.k.a. FRF). This handles the Isw 

of a Iongword. It is a 257-pin 22K gate array with GPR (ICS 10220G). This 
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